Performance Monitoring With OpenTracing, OpenCensus, and OpenMetrics | Datadog

Performance monitoring with OpenTracing, OpenCensus, and OpenMetrics

Author Mallory Mooney

Published: February 6, 2019

If you are familiar with instrumenting applications, you may have heard of OpenMetrics, OpenTracing, and OpenCensus. These projects aim to create standards for application performance monitoring and collecting metric data. Although the projects do overlap in terms of their goals, they each take a different approach to observability and instrumentation. In this post, we’ll provide an introduction to all three projects, along with some key differentiators of each, and how they best support application monitoring.

Key project differences

OpenMetrics aims to create a standard format for exposing metric data, while OpenTracing and OpenCensus focus on creating a standard for distributed tracing. Because the OpenCensus and OpenTracing projects share similar goals, there is a lot of overlap with their tracing APIs. They both employ a standard for tracking requests across process boundaries so you can visualize all the operations (e.g., database calls, caching) that go into fulfilling individual requests. This enables you to monitor application performance with one of the several backends (e.g., Datadog, Zipkin) that OpenTracing or OpenCensus supports.

OpenCensus is a part of the Google Open Source community, and OpenTracing and OpenMetrics are Cloud Native Computing Foundation (CNCF) projects. The OpenCensus and OpenTracing projects use similar mechanisms, though they refer to them in different terms:

OpenTracingOpenCensus
TracerExporter
MetricsStats
Tracing interfaceGeneric context propagation format

OpenTracing is a standardized API for tracing and provides a specification that developers can use to instrument their own services or libraries for distributed tracing. OpenTracing also provides a way for developers to collect metrics, though it’s not an out-of-the-box implementation. OpenCensus is a collection of language-specific libraries for instrumenting an application, collecting stats (metrics), and exporting data to a supported backend.

Both projects are vendor-neutral, though there are some caveats. Because OpenTracing is an API, you can easily change which backend you use to store, visualize, and explore your traces, with minimal configuration. However, OpenTracing relies on the backend projects and vendors to implement their own tracers, meaning you have to ensure that your backend of choice provides a tracer that is compatible with the OpenTracing specification. OpenCensus, on the other hand, includes built-in exporters for multiple backends, but support for collecting different data types (distributed traces and metrics) varies by backend and by programming language.

Distributed tracing and instrumentation

Before we dive deeper into the specifics of each project, we need to take a look at the problem they try to address. An important aspect of application monitoring is instrumentation (e.g., for distributed tracing or custom metric collection), which enables you to get detailed insights into how your application is performing.

Distributed tracing, in particular, is critical for understanding how a request moves across multiple services, packages, and infrastructure components. Distributed tracing follows requests through traces and spans. Traces are records of the entire lifecycle of a request, including each service it interacts with (e.g., MongoDB, Redis, NGINX). Each trace is made up of one or more spans: operations executed by a service for a request (e.g., GET, POST, PUT). Spans can include information such as the status of an HTTP request, the name of the service executing the operation, timestamps, and links to other spans.

Example trace

In the example trace above, a single request is broken down into separate spans across multiple services. These visualizations enable you to troubleshoot performance issues such as long execution times for a database query.

Tracing is especially useful for applications that are built with microservices. Because this type of architecture uses loosely coupled components or services, distributed tracing that captures the full lifecycle of a request becomes integral to understanding how well each service is performing for users.

The issue comes with tracking a request across all necessary services and application code to ensure that you can monitor its full path. Because of the abundance of services, packages, and frameworks that can make up an application, it can be difficult to connect data coming from the individual services. This fragmentation grows as APM vendors and frameworks provide their own tracers and instrumentation, increasing the number of potentially incompatible tools.

Projects like OpenTracing, OpenCensus, and OpenMetrics try to address this by providing standards for instrumentation and collecting data. This enables APM vendors and developers the ability to build portable tracers and instrumentation to track a request as it travels through each service within your application, and gather information about the metrics you deem most important.

What is OpenTracing?

OpenTracing implements a distributed tracing standard for software through a general-purpose API. The goal of this API is to incorporate distributed tracing at both the service level and application level, allowing developers to easily track requests across every service that makes up an application, all using the same tracing standard. This enables library developers to ship instrumented code so users can monitor their applications with the supported tracer of their choice out of the box. By using the same standard at the service level and application level, developers can easily switch tracers without having to change any instrumented code.

Span management and cross-process propagation

OpenTracing provides a specification for span management that can be used with any of their supported implementations. The API uses a tracer interface to provide the methods needed for creating new spans and for moving span data across process boundaries (cross-process propagation) through injecting or extracting spanContext from carriers. Carriers store span metadata in key:value maps or binary data, and OpenTracing requires backends to support three carrier formats, including HTTP request headers.

At the service level, developers can instrument their services using OpenTracing’s default no-op tracer. This tracer serves as the scaffolding that implements the tracer interface, so developers can create spans and inject or extract spanContext using the same specification, without needing to choose a tracing backend in advance.

For applications with services that already use the OpenTracing API, developers can begin sending traces to a backend with a supported tracer by simply changing a few lines of code. They can also incorporate the API directly with operation or function calls needed to handle a client request (e.g., retrieving a web page, querying a database). According to the OpenTracing specification, every span should include the following information:

  • An operation name: typically includes the action executed for that span such as HTTP GET /customer
  • Start and finish timestamps
  • Tags: categorizes the entire span and can include information such as the db.user, http.status_code, and http.method for the request associated with that span
  • Logs: tracks specific events within a span such as the stack trace or log message
  • SpanContext: distributes information about the span across processes, such as the traceID, spanID, and baggage items

For a client (or internal) request, the tracer creates the first (or parent) span with these elements, then injects the spanContext into a carrier in order for that span’s metadata to move across processes. When the request triggers an operation from another service (e.g., retrieving a record from a database), the tracer then creates a new span for that operation and extracts the spanContext from the carrier. This links the new span with the original one, giving it a ChildOf relationship. Along with span and trace identifiers, spanContext may also include baggage items — key:value pairs that provide additional metadata for an operation (e.g., http.user_agent:, special_id:).

This process continues as the request crosses over each service boundary, until it reaches the end of the application workflow. The result is a tree of spans representing each microservice that interacted with the request, with causal ordering so that the entire request path can be reconstructed. You can then use one of the several supported tracing backends to track, view, and explore traces.

Though OpenTracing provides the interface for creating and collecting traces, how those traces are sampled and retained depends on the tracing backend. For example, Datadog, one of OpenTracing’s supported backends, employs multiple sampling techniques for collecting and storing traces.

Metric collection

OpenTracing does not provide an API for metric collection out of the box, though there is a project that utilizes the API’s specification to collect application metrics (e.g., duration) within the context of a span. This project is currently written for Java applications and submits metrics to different backends.

Instrumentation

OpenTracing provides a common interface between traces by standardizing span creation and context. Its API is comprehensive, so developers can ship out-of-the-box distributed tracing with their services. This enables users to easily incorporate an OpenTracing-compliant backend such as Datadog and profile their applications that are using these services.

What is OpenCensus?

OpenCensus is a platform for metric collection and tracing. It’s larger in scope than OpenTracing, and offers a collection of language-specific libraries with APIs for sampling and metric collection, providing more structure around tracing and metric collection. These libraries are also packaged with tests to ensure the APIs work end-to-end. Each library can collect and export metrics and traces from your application to multiple backends, such as Zipkin and Datadog, for analysis.

A benefit to the OpenCensus architecture is working within a single project repository: the OpenCensus instrumentation project. This allows the project to maintain consistency across each supported language and ensure API stability through end-to-end tests for each language-specific library. Plus, you can view all available exporters, integrations, and supported propagation formats within each library, making it easier to understand how each piece fits together in the OpenCensus ecosystem. This is in contrast to the OpenTracing project, whose libraries are managed across several third-party project repositories.

Span management and cross-process propagation

OpenCensus includes support for popular programming languages, including Java, Ruby, and Node, and provides an API for instrumenting applications. The project refers to tracers as exporters and, like OpenTracing, uses them to export traces to a backend. For moving span context across process boundaries, OpenCensus requires that all implementations (e.g., Node, Java, Go) provide a generic context propagation interface and support multiple propagation formats.

For the OpenCensus project, spans should include the following fields:

  • Name: a description of what the span does, such as HTTP GET /dispatch
  • SpanID: the unique identifier of the span
  • TraceID: the unique identifier of the trace
  • ParentSpanID: the root span
  • StartTime/EndTime: the timestamps for the span
  • Status: a code that represents the state of the span (e.g., OK, ABORTED, UNAVAILABLE)
  • Time events: an event (e.g., function call, database query) that occurred within the timeframe of the span
  • Link: identifiers that link related spans (e.g., TraceID, SpanID)
  • SpanKind: represents the relationships between spans (e.g., server, client)
  • TraceOptions: determines if the trace is sampled or not
  • Tracestate: provides additional information for backends as a list of key:value pairs
  • Tags: provide contextual information for a span and link metrics to a specific span

OpenCensus provides greater control over how their exporters manage and organize traces and stats. The project includes trace sampling with four sampling types: Always, Never, Probabilistic, RateLimiting. These types control how often an exporter processes and exports a sampled trace to a backend.

Metric and log collection

OpenCensus also includes an API for collecting stats (i.e., metrics) from your applications. Metric collection is made up of measures and views and can be grouped by optional tags. Measures are different types of metrics that produce measurements: the data value recorded by the measure. For example, latency in milliseconds is a measure, while its value at a particular moment (e.g., 250 ms) is the measurement.

Every recorded metric is grouped into an aggregation of measures, or measure calculations:

  • Count: the total number of measurements
  • Distribution: a histogram distribution of measurements
  • Sum: the sum of measurements over a timeframe
  • LastValue: the last recorded measurement value

OpenCensus enables users to create views, groupings of specific metrics, to collect aggregations, measurements, and tags and export them to a backend that supports stats collection. Unlike traces, stats are not sampled and are always recorded.

Metric and trace APIs in OpenCensus are decoupled; you can use one without the other. If you decide to use both, the generic context propagation interface ensures consistency in data collection so you can easily correlate traces with metrics. For example, as part of its specification, OpenCensus requires that all libraries support at least the B3 propagation format. You can read more about the specification in the project repository.

Log collection is still an open topic of discussion for OpenCensus, though there are projects in the early stages of development that aim to correlate logs with traces.

Instrumentation

In addition to manual instrumentation, OpenCensus libraries such as opencensus-node include packages for auto-instrumenting your applications. And, if you do not want to export data to a separate backend, OpenCensus includes a web interface for quickly displaying spans and traces called zPages.

OpenCensus also builds upon its APIs by developing out-of-the-box integrations with popular tools, including Redis, MongoDB, and Google Cloud Platform. If you are already using one of these products, you can begin collecting traces and stats with OpenCensus without additional instrumentation.

What is OpenMetrics?

Unlike OpenCensus and OpenTracing, which enable instrumentation for distributed tracing, OpenMetrics aims to create a standard specifically for exposing metric data. The OpenMetrics project is now a part of the CNCF sandbox, and the team behind it is currently working on incorporating its exposition format with OpenCensus.

OpenMetrics uses the Prometheus exposition format as the starting point for its standard. Prometheus displays metrics line-by-line in a text-based format and supports the histogram, gauge, counter, and summary metric types:

# HELP http_requests_total The total number of HTTP requests.
# TYPE http_requests_total counter
http_requests_total{method="post",code="200"} 1027 1395066363000

This format is easy to read, though it can be verbose if you are extracting a large volume of metrics. OpenMetrics aims to include new enhancements and improvements to the Prometheus format, including:

  • Unifying the escape rules for comments and labels
  • Providing Boolean support
  • Cleaning up whitespace rules
  • Expanding language support

OpenMetrics also supports protocol buffers, though there is some discussion on focusing on the text-based format only, as newer versions of Prometheus dropped protocol buffer support. As a part of the CNCF sandbox, OpenMetrics is still in the early stages of development, so there are still many open discussion points around how the exposition format should work. You can stay up-to-date on these discussions by joining the OpenMetrics forum.

Instrument your applications and services

The OpenTracing, OpenCensus, and OpenMetrics projects aim to simplify and standardize the process of instrumenting your applications for monitoring and observability. Each project has a large contributing community and a growing list of supported tracers and frameworks. Because there is some overlap between OpenTracing and OpenCensus, there is a plan to merge the two projects, though it’s still in the early stages of development.

As a member of the OpenTracing Specification Council and a supported vendor, Datadog provides OpenTracing-compliant tracers for Java, Node.js and Go applications, which support multiple frameworks, libraries, and data stores out of the box. Datadog also provides Go and Java exporters for stats and trace collection with OpenCensus and an OpenMetrics integration, enabling you to immediately begin forwarding data to Datadog with minimal configuration.

By using a common interface for traces and exporters, you can quickly instrument your applications and libraries. Check out each project to learn more about getting started with distributed tracing for your applications and libraries, or start instrumenting your applications with one of Datadog’s compliant tracers.