To get visibility into highly distributed applications, organizations often use various tracing tools that are best suited to each individual service owner’s specifications. However, when a request travels between services that have been instrumented with different tools, the trace data may be formatted differently, resulting in broken traces.
W3C Trace Context aims to address this problem by defining a standardized format for unifying trace data from distributed tracing solutions. Datadog APM supports W3C Trace Context, allowing teams to capture complete traces from services that have been instrumented with any system that follows this standard, including OpenTelemetry (OTel) libraries, Datadog’s tracing libraries, Jaeger, and other vendors. In this post, we’ll walk through the challenges of propagating traces across distributed systems and how W3C Trace Context can help improve the observability of your applications.
To understand how vendor-specific trace formats can lead to broken traces, we need to take a closer look at how distributed tracing works. When a service that has been instrumented for tracing processes a request, its tracer will record how the service interacts with the request and encode this contextual data into an HTTP header. This header is then passed along to subsequent services and platforms as the request travels downstream. Each tracing tool may use a different header format for encoding trace data (e.g., Datadog employs its own proprietary format, while Zipkin uses the B3 format), as shown in the diagram below.
As the request travels downstream from Service A to B to C, each tracing tool needs to incorporate its own contextual trace data with the incoming data and then forward this combined data to the next service. But if these services are instrumented with tracing tools that use incompatible headers, the trace data cannot properly propagate across services. This results in separate traces with missing spans, rather than a single trace that visualizes the complete request with spans from each service.
W3C Trace Context enables teams to gain full visibility into their services when instrumenting them with multiple tools. The Trace Context specification splits trace context data into two headers:
traceparent contains all the necessary fields for propagating trace context in a common format to support interoperability between tracing tools. This includes a unique trace ID for the distributed trace and the ID of the parent span. Using these identifiers,
traceparent is able to position the given trace in relation to the incoming request and then propagate this data downstream to the next service.
tracestate is an optional header that can be used to propagate vendor-specific information. When a trace propagates between two services that have been instrumented with different vendors, each service’s vendor-specific data will be appended to the existing
tracestate, as shown in the diagram below.
With the introduction of W3C Trace Context support, Service B is now able to successfully receive the incoming trace header from Service A. It constructs a new
tracestate header by adding its own vendor-specific ID to the previous header. It also constructs a new
traceparent header with the same hyphenated trace ID value (shown as
f685 in the diagram) and a different parent ID.
When inspecting the distributed trace in Datadog APM, the developer can now track the complete path of the request, which provides invaluable context for troubleshooting. In the trace below, we’re able to visualize the complete distributed trace consisting of spans from both the
calendar-java service (which has been instrumented with the OTel Java SDK) and the
calendar-py service (which has been instrumented with the Datadog Python Tracing Library). In the Info tab, we can see that the selected span’s
traceparent header is in the W3C format.
|The following Datadog APM tracing libraries support W3C Trace Context and complement our existing support for B3 trace headers:|
To learn more about improving visibility across your applications with trace context, you can view our documentation. Our RUM browser and mobile SDKs also support W3C trace headers, enabling you to correlate RUM events with OTel-instrumented traces to troubleshoot user-facing issues. You can learn more about this and get other updates on our OpenTelemetry work in our blog post and our OTel docs.
If you don’t already have a Datadog account, sign up for a free 14-day trial today.