The Service Map for APM is here!
Datadog APM gains 3 superpowers: Trace Search, Service Map & Watchdog

Datadog APM gains 3 superpowers: Trace Search, Service Map & Watchdog

/ / / /
Published: September 6, 2018

Since we made Datadog APM generally available last year, we have continually added new features and support for new languages and frameworks to ensure that you can monitor every aspect of application performance. Datadog APM helps companies such as Airbnb, Square, and Zendesk to optimize application performance and deliver top-notch customer experiences.

Datadog APM now supports Java, Python, Ruby, Go, and Node.js, with support for .NET and PHP coming soon. Over the past few months, we’ve also added three powerful new features to Datadog APM: Watchdog, Trace Search & Analytics, and the Service Map. In this post, we’ll explore how you can use APM’s new superpowers to get deeper visibility than ever before into the performance of your applications.

Trace Search & Analytics

Trace Search & Analytics enables you to search, filter, and aggregate APM data at infinite cardinality. You can find the needle-in-the-haystack trace that matches custom tags such as customer ID, service, endpoint, cluster, pod, or product SKU.

In the analytics view, you can slice and dice performance metrics using tags so you can visualize which endpoints are returning the most errors to a particular customer, or identify the 10 customers experiencing the highest p90 latency in any part of your application.

See how Zendesk is using Trace Search & Analytics to find specific traces and perform customer-level monitoring on their application in this video from Dash 2018:


Check out the Zendesk Engineering Blog to learn more about why Trace Search & Analytics has become a critical part of their application monitoring workflow.

Watchdog

Watchdog is a new auto-detection engine that surfaces anomalies in your applications. It works out of the box with zero configuration to start monitoring every service, endpoint, and database query. Using our extensively field-tested machine learning algorithms, Watchdog detects issues such as latency spikes in your microservices, anomalous changes in throughput on your endpoints, error rate spikes from a particular SQL query, or network issues in one of your cloud provider’s availability zones.

See how Square is using Watchdog to help identify critical issues and do root cause analysis through stack traces in this talk from Dash 2018:

Service Map

The Datadog Service Map decomposes your applications into all their component microservices and draws the observed dependencies between those services in real time, so you can identify bottlenecks and understand how requests flow through your architecture. And just like Watchdog, setup is effortless—the Service Map visualizes data that’s collected automatically once you set up Datadog APM.

Airbnb uses the Service Map for real-time root cause analysis, onboarding new engineers, and to discover unintended dependencies in their distributed systems. Check out this video to see how the Service Map helps break down complexity at Airbnb:

End-to-end visibility, out of the box

Because Datadog APM works with Java, Python, Node, Ruby, and Go, you can automatically instrument your applications and start tracing requests immediately.

Once you set up Datadog APM, you’ll start seeing end-to-end request traces from all of your instrumented services in Datadog, along with auto-generated application health metrics that track the latency, throughput, and error rates for every service, endpoint, and database query.

An auto-generated service page in Datadog, with application health metrics and a latency distribution.

Every trace is represented as a flame graph that shows which services, endpoints, calls, and queries went into serving a request, as well as the latency associated with each of those operations. Traces allow you to see inefficient code pathways immediately and diagnose errors by viewing stack traces and error messages that are automatically collected from your application.

An APM flame graph in Datadog tracing a request from end to end.

The three pillars of observability

Tracing is essential to observability efforts, but its benefits are multiplied when traces can be correlated and combined with other key data sources. Datadog seamlessly unifies traces, metrics, and logs—the three pillars of observability—so you can pivot instantly between related data using tags. For instance, every request trace carries additional context about the application environment. Traces automatically display related logs as well as metrics from the application host at the time that the request was executed:

Finally, because APM data is a first-class citizen in Datadog, you can build alerts and dashboards around any of the performance metrics you’re collecting from your applications. You can monitor latency percentiles to ensure that you’re upholding your SLAs, or build alerts that are precisely tailored to the performance profile of a particular microservice or endpoint:

Get started with Datadog APM

With Trace Search & Analytics, Watchdog, and the Service Map, Datadog APM provides unparalleled visibility into modern applications. From surfacing anomalies to mapping service dependencies to tracing the execution pathway of a single request, Datadog APM enables you to understand, troubleshoot, and optimize application performance in one platform. Give it a try today with .