Troubleshoot Streaming Data Pipelines Directly From APM With Datadog Data Streams Monitoring | Datadog

Troubleshoot streaming data pipelines directly from APM with Datadog Data Streams Monitoring

Author Candace Shamieh
Author Jane Wang
Author Lucas Kretvix

Published: 1月 25, 2024

When monitoring applications with streaming data pipelines, there are additional complexities to consider that are not present in traditional batch-processing systems. Whether you’re using streaming data pipelines to power a digital trading platform, capture sensor data from an IoT device, or recommend news articles to users, it can be challenging to identify the root cause of delays when you’re dealing with distributed systems, real-time data, and the dynamic nature of events. As a result, monitoring tools must detect issues quickly, scale flexibly, analyze and correlate complex events, and provide end-to-end visibility from data ingestion to processing and output.

Datadog Data Streams Monitoring (DSM) offers these robust capabilities, enabling you to optimize your event-driven applications that use streaming data pipelines such as Kafka or RabbitMQ. DSM allows you to track and improve application performance by providing visibility into all of the services and queues across your pipeline in one place. And, to make it even more convenient to monitor these applications, we’re excited to announce that we’ve embedded the DSM topology visualization directly into APM on the Service Page. The DSM integration in APM displays a high-level overview of your streaming data architecture, allowing you to identify performance issues such as blocked messages, offline consumers, and high-latency queues so you can troubleshoot and resolve issues even faster.

In this post, we’ll discuss how the DSM integration in APM helps you:

See a high-level overview of your streaming data dependencies in one place

Viewing your entire application architecture with the DSM integration in APM enables you to analyze application metrics alongside the topology of your streaming data pipelines so you can remediate bottlenecks. Within the APM Service Page, you can toggle between the DSM map and APM dependency map in the Dependencies view to see your monitors, lag, application requests, error rate, and more.

View  of the DSM map on the Service Page in APM

The DSM integration in APM displays your service’s queues, upstream producers, and downstream consumers. With all of this information in one convenient location, you can gain insight into where delays and inefficiencies occur through the queues in your pipeline and formulate targeted optimization strategies. Understanding your application’s dependencies with services and queues makes it easier to track the flow of data and events and measure how your resources are affected when issues arise.

To catch issues proactively, you can set a monitor that alerts on consumer lag, throughput, end-to-end latency for your pipelines, or the time messages spend in the queue. You can also view and interact with active DSM monitors in the APM side panel.

Conduct investigations without workflow disruption

The DSM integration in APM helps you avoid context switching as you troubleshoot and resolve pipeline inefficiencies. When an issue occurs, you can correlate APM metrics, such as application traces and faulty deployments, with pipeline information and investigate without interrupting your workflow. This insight enables you to quickly pinpoint the issue’s root cause with precise accuracy.

For example, let’s say APM alerts you that a service is experiencing consumer lag. You can start the investigation by reviewing the DSM map on the same Service Page. Selecting the service node on the DSM map lets you view the side panel that contains all the telemetry you need to troubleshoot, like traces, logs, throughput, and infrastructure metrics of your service. You can also see which individuals or teams are assigned on-call duty for affected service dependencies, so you can reach out and collaborate with them for fast resolution.

Side panel telemetry of an example authenticator service that displays latency metrics

The DSM integration in APM also visualizes alerts for pipeline components that experience high consumer lag and routes you to the DSM page if you need more details for your investigation.

Monitor your streaming data pipelines in APM today

The Data Streams Monitoring integration in APM allows you to view your streaming data pipelines alongside your entire application architecture to understand how your application interacts through services and queues. With the DSM integration in APM, you can conduct investigations without disruption to remediate bottlenecks quickly, pinpoint root causes, and improve application performance.

You can quickly identify which services are eligible to use DSM by navigating to the Service Page of any services with upstream producers or downstream consumers. To set up your service with DSM, visit our setup guide. To learn more about DSM and APM, visit our documentation.

If you’re new to Datadog, get started now with a .