The Service Map for APM is here!
Track the status of your SLOs with the new monitor uptime widget

Track the status of your SLOs with the new monitor uptime widget

/ / /
Last updated: December 13, 2018

The beta program for the monitor uptime widget is currently closed to new requests, but you can sign up for the waitlist here.

Service level objectives are an important tool for maintaining application performance, ensuring a consistent customer experience, and setting expectations about service performance for both internal and external users. We are very pleased to announce the availability of a new monitor uptime widget that makes it simple to monitor the status of your SLOs and communicate that status to your teams, executives, or external customers.

Monitoring the status of SLIs and SLOs on a Datadog dashboard
With monitor uptime widgets (left), you can quickly see how your services are performing against the service level objectives (SLOs) that you've established.

SLOs and SLIs

Best practices around SLOs have been pioneered by Google’s Site Reliability Engineering team—the Google SRE book and this talk from this year’s Dash conference both provide excellent introductions to service level objectives and service level indicators (SLIs). In short, SLOs set precise targets for your SLIs, which are the metrics that reflect the health and performance of a service. For instance, if you want to ensure that typical user requests are serviced quickly, you might use your service’s median latency as an SLI. You could then define an SLO such as, “the median latency of all user requests (as computed every minute) will be less than 250 milliseconds 99 percent of the time in any calendar month.”

To accurately track how actual performance compares to the objectives you’ve set, you need a way to not only monitor real-time performance (e.g., computing the median latency every 60 seconds and comparing it against the 250-ms threshold) but also to measure how often that threshold has been breached over longer timespans (to ensure that the 99 percent objective is met for every calendar month).

Visualize SLO status on your dashboards

Monitoring the status of a latency SLO with the monitor uptime widget
A monitor uptime widget displays how request latency compares to a service level target in the month to date.

The new monitor uptime widget enables you to visualize SLOs on your Datadog dashboards, which you can share internally or externally to communicate the real-time status of your SLOs to anyone who depends on your service. Building on Datadog’s sophisticated alerting engine, you can create a Datadog monitor for any service level indicator, for example, ensuring that median latency remains below 250 ms in the example above.

The new monitor uptime widget allows you to visualize how often that threshold has been breached, over common SLO baselines such as the previous week, month, year, or the month to date. You can then set conditional formatting rules to, for instance, display the status in green if the threshold has been met 99 percent of the time over the month to date, and change the status to red if the threshold has been met less than 99 percent of the time.

Break down SLO status using tags

Monitoring the availability of a consul cluster with the monitor uptime widget
Visualizing the availability of a Consul cluster, on a per-node basis and as compared to the overall SLO.

The monitor uptime widget allows you to visualize the overall status of your SLOs, but they also show you at a glance how different segments of your infrastructure are contributing to performance. For instance, you can see the status of your uptime SLO for a service, and break down the uptime by host or data center to easily isolate localized issues. In the example above, we’re monitoring the availability of our Consul cluster, along with the availability of the individual nodes in the cluster, so we can quickly zero in on any issues that arise.

Display and share service status

The monitor uptime widget provides a new level of functionality for monitoring and enforcing your SLOs, as well as providing transparency to any stakeholders or users who depend on those SLOs being met. If you’re already a Datadog customer, you can sign up here to join the waitlist for beta access. And if you aren’t yet using Datadog to monitor the health and performance of your services, you can sign up for a .