Get Started with Datadog

The Monitor

Accelerate OTel gateway resolutions with Datadog Fleet Automation

Published

Read time

4m

Accelerate OTel gateway resolutions with Datadog Fleet Automation
Jessica Yang

Jessica Yang

Associate Product Manager

Candace Shamieh

Candace Shamieh

Technical Writer

The OpenTelemetry (OTel) gateway deployment pattern helps platform teams scale telemetry data collection by aggregating telemetry data and centralizing processing tasks before routing it to observability backends. While gateway deployments provide teams with flexiblity, it can also complicate troubleshooting as telemetry data flows through multiple intermediary systems. When data volume drops, spikes, or encounters bottlenecks, engineers must consult multiple tools to understand the full pipeline and pinpoint root causes. The lack of complete visibility into OTel gateway architectures leads to increased operational overhead, slower incident triage, and ultimately leads to a longer mean time to resolution (MTTR). 

Datadog Fleet Automation now addresses this fragmented troubleshooting with an end-to-end view of your OTel gateway architecture at the cluster level for the Datadog Distribution of OpenTelemetry Collector (DDOT) and upstream-compatible OTel Collectors. By unifying visibility of your gateway architecture, traffic patterns, Collector configurations, and active monitor signals, Fleet Automation enables you to minimize context switching between tools, isolate problematic Collectors or components, and remediate issues faster.

In this post, we’ll show how Fleet Automation helps you:

Visualize your OTel gateway architecture

OTel gateway deployments often involve complex telemetry data routing: Kubernetes DaemonSet Collectors forward node-level data to load balancers or gateway services, which then route to one or more layers of gateway Collectors before the data reaches an observability backend. Teams must switch between multiple Collector YAMLs, deployment manifests, and architecture diagrams to piece together the full telemetry data pipeline, making it difficult for them to validate whether the data is following the expected paths.

Topology View in Fleet Automation gives platform teams a cluster-level view of their end-to-end OTel gateway architecture. The visualization represents each layer of the gateway deployment as connected nodes, making it easier to validate telemetry data routing from sources to destinations across the full deployment.

Topology View maps the full OTel gateway architecture as connected DaemonSet and gateway Collector nodes, validating telemetry data routing to backends.
Topology View maps the full OTel gateway architecture as connected DaemonSet and gateway Collector nodes, validating telemetry data routing to backends.

With this broader view, teams can easily validate routing after adding a new backend, modify routing rules, or introduce additional gateway layers for scale without context switching.

Detect traffic anomalies across OTel gateway deployments 

Even with a clear view of your gateway architecture, it can still be difficult to understand where telemetry data behavior diverges from expectations. Missing signals or unexpected traffic volume may originate from a single telemetry data pipeline, backend route, or component within a Collector. Without traffic context, teams may know that data is missing or delayed, but not where the problem occurs.

Topology View helps you narrow down to the pipeline of a specific telemetry data type and provides traffic insights to pinpoint abnormal traffic flow patterns across your gateway deployments. For example, if a backend destination is slow to accept trace data, backpressure can build up in the gateway layer and cause traces to queue or drop before they are exported. In Topology View, you may see traffic entering the gateway Collectors as expected, but reduced or delayed trace traffic leaving toward the backend.

Topology View with traffic enabled traces span rates across the OTel gateway, isolating reduced trace throughput on the path to the backend.
Topology View with traffic enabled traces span rates across the OTel gateway, isolating reduced trace throughput on the path to the backend.

By filtering to traces and comparing traffic across the affected route, teams can quickly isolate the issue to the gateway-to-backend path and begin troubleshooting the relevant exporter, queue, or backend destination instead of reviewing the entire deployment. 

Investigate OTel Collector issues with active monitors and configuration context

After teams identify a problematic route or Collector, they still rely on manual operations to cross-reference dashboards, monitors, and configuration YAMLs to pinpoint the exact root causes and begin remediation workflows. This can lead to longer MTTR and negative business outcomes. 

Fleet Automation overlays Topology View with monitor alerts context, so you can understand at a glance which OTel Collectors have active issues that require attention. To investigate further, you can quickly drill down to a single OTel Collector by using Pipeline View, which shows how data flows through configured OTel receivers, processors, connectors, and exporters.

Pipeline View overlays a triggered monitor alert and configuration YAML on a single OTel Collector to start remediation in one place.
Pipeline View overlays a triggered monitor alert and configuration YAML on a single OTel Collector to start remediation in one place.

You can then use component-level monitor alerts and configuration YAML snippets to start troubleshooting workflows in one place, reducing the time spent correlating signals across tools and accelerating remediation.

Start troubleshooting OTel gateways in Datadog

Fleet Automation brings OTel gateway topology, traffic insights, monitor alerts, and configuration context into a single troubleshooting workflow, helping you better understand your OTel gateway architectures, isolate problematic Collectors or routes, and resolve issues faster. With this unified troubleshooting experience, platform teams can now access the scalability and flexibility benefits of OTel gateway deployments with greater confidence, while reducing the operational complexity of running them in production.

To get started, visit the Fleet Automation and DDOT gateway setup documentation. If you’re new to Datadog, you can .

Start monitoring your metrics in minutes