Monitor code deployments with Deployment Tracking in Datadog APM

Danny Park

Jonathan Epstein

Continuous integration and continuous delivery (CI/CD) pipelines have become fundamental to modern software development and code deployment. Implementing CI/CD practices can let teams deploy code more quickly and efficiently. But with these methods come a number of new challenges: bad code deploys are a major source of downtime and can lead to a loss of revenue and customer trust. Code deployments are complicated, multi-faceted operations, and by the time a problem is detected, it might already be too late to revert to a previous version without customer-facing impact.

To help teams meet this challenge, Datadog distributed tracing and APM features Deployment Tracking. Deployment Tracking uses Datadog’s reserved version tag, a unified tag that lets Datadog automatically aggregate performance data based on the code version’s infrastructure assets, traces, trace metrics, profiles, and logs. This makes it easy for developers to compare the performance of code deployments against their existing live code to verify that new code is performing properly and that no new errors have surfaced in between versions. Deployment Tracking in Datadog APM enables developers to adopt modern deployment strategies with peace of mind by letting them quickly roll back their release candidates as soon as they spot an issue so they can avoid service outages.

Deployment tracking—simplified

Datadog’s version tag accepts any string value, letting you easily implement Deployment Tracking with whatever type of versioning your teams use, such as semver or Git SHA values. Deployment Tracking for Datadog APM tracks all versions deployed over the last 30 days, giving you a wide window for continuous deployment analysis. Datadog automatically provides out-of-the-box graphs that visualize RED (requests, errors, and duration) metrics across versions, making it easy to spot problems in your services and endpoints before they turn into serious issues. And, with Automatic Faulty Deployment Detection, Datadog uses machine learning to automatically flag bad deployments so you can investigate and determine if you need to roll back to a previous version.

Datadog automatically flags faulty deployments.

Deployment Tracking is fully integrated with the rest of Datadog, meaning that you can seamlessly pivot from your deployment tracking metrics to any associated monitoring data, such as the relevant logs, traces, metrics, and profiles that provide more context about what’s happening within your applications and underlying infrastructure.

Taken together, these features enable you to:

Identify when a deployment introduces a new type of error according to request and error rates
Use aggregate metric comparisons to verify that an endpoint hotfix actually fixes the problem
Validate that an out-of-date endpoint has been successfully deprecated

In the following section, we’ll look at how you can use Datadog APM to monitor code deployments across multiple modern deployment strategies, including:

Canary deployments
Blue/green deployments
Shadow deployments

Canary deployments

Canary deployments consist of deploying new code to a subset of users or hosts as a cautionary test so that, if a problem rises, there is minimal impact to your application as a whole. If the canary deployments function as expected, you can then deploy the new version to the next subset of your environment.

In the following screenshot, we can see two graphs tracking the total requests and percent change of service errors over a period of six hours, broken down by version. When the new version is rolled out, a first canary is deployed to a subset of hosts, which causes a spike in service errors. Because of this, the new version is rolled back. Later, a newer version is released with an error rate that matches the baseline error rate, indicating it’s safe for full deployment.

A new version is deployed as a single canary, causing a spike in errors. The canary is rolled back and, after the problem is addressed, a newer version canary is deployed. With no spike in errors, the final version is deemed safe and fully deployed.

Blue/Green deployments

Blue/green deployments are performed by running two (or more) nearly identical clusters of services simultaneously, with the only difference between them being the new code or feature addition. Traffic is routed to both hostgroups while the hostgroup with the new code (the “green” group) is on standby; once all testing is complete and the new version is determined to be safe, the existing hostgroup (the “blue” group) can be placed on standby while the green group is used as the production version. This way, you can quickly rollback to the blue (stable) group in case any problems occur, assuring that no downtime occurs in-between.

The following table shows that new errors appeared across versions during a blue/green deployment. The team pushing the code used this information to track and fix the errors before full deployment. Even if the new errors are relatively rare, running a blue/green deployment on a full production load provided them with enough data to perform troubleshooting.

With equal traffic being routed to both versions, we can compare the performance of the new code and detect new errors. Here, we see the change in error types detected across versions during a blue/green deployment.

Shadow deployments

Shadow deployments are performed by asynchronously copying real-user traffic from a production-ready environment into the new version environment, allowing you to test the new environment without users personally interacting with it. Because they simulate real user behavior without the potential for live errors, shadow deployments are particularly helpful when you are rolling out a critical update and cannot risk any downtime or performance issues. Shadow deployments are similar to green/blue deployments in that you are routing traffic to two different environments, except in this case only one of the environments is live.

Set up and use Deployment Tracking

In order to pull in application performance data from your environment, you need to first install the Datadog Agent on your host instances and containers and set up Datadog APM and Tracing. See our documentation for further instructions on instrumenting your applications. Once everything is properly configured, Datadog will begin collecting traces from your services and visualizing RED metrics in out-of-the-box graphs in the Services view. Now, when you deploy new code, the version tag lets you quickly break down RED metrics by version and drill down deep into their relative performance with custom comparison graphs, such as Error rate by version and Total requests by version.

Deploy with peace of mind

Deployment Tracking extends Datadog’s APM capabilities and makes it easier than ever to track and control your code deployments down to the smallest details. Combined with integrations for monitoring the status of your build pipelines, Datadog gives you visibility into your entire application development lifecycle. If you’re already using Datadog, you can start using Deployment Tracking with your services now. Otherwise, get started with a 14-day free trial.

Get Started with Datadog

Monitor code deployments with Deployment Tracking in Datadog APM