9 minute read

Published

Share

Datadog governance 101: From chaos to consistency
David Iparraguirre

David Iparraguirre

Aaron Herrmann

Aaron Herrmann

Addie Beach

Addie Beach

As your organization scales, managing observability resources and usage becomes increasingly important. More users and teams mean more dashboards, tags, API keys, and costs to manage. The job of keeping track of these resources and ensuring that they’re compliant can quickly grow in complexity. An effective observability governance strategy can help you handle these challenges at scale as you repeatedly work to onboard new services, determine who owns dashboards and monitors, control visibility over sensitive information, and manage usage over key features.

In this post, we’ll explore the basics of observability governance, a few key best practices, and how you can use Datadog to put your governance strategy into practice.

What is observability governance?

A key feature of many observability platforms is that they enable anyone in your organization to access monitoring data, which is essential for breaking down silos across teams. However, this openness and flexibility can present significant compliance and security risks. As the number of dashboards, monitors, and pipelines in your monitoring platform grows, it can become challenging to manage who owns what and who should have access to it. Additionally, usage anomalies can easily go unnoticed until costs spiral out of control.

Observability governance addresses these challenges by providing you with a framework and tooling for handling the growth of your monitoring efforts safely and consistently. It helps you manage data access, attribute costs to specific teams and services, enforce organization-wide policies, and ensure effective product usage.

How observability governance is structured can vary across organizations. At its simplest, it can be a set of guidelines and best practices that teams follow. In larger organizations or in industries dealing with sensitive data, governance may be managed by a centralized team responsible for ensuring those practices are followed and applied consistently.

Regardless of whether governance management is left to individual teams or a single centralized group, the core best practices remain largely the same. We’ll explore what these practices look like and how you can implement them below.

Observability governance best practices

A good observability governance framework covers every aspect of how your organization interacts with monitoring data. It should include tools and processes that are simple enough for teams to easily implement, while remaining flexible enough to accommodate their individual needs.

There are four key pillars to an effective observability governance strategy:

Tagging

Consistent tagging should be the foundation of your governance strategy, as it helps you establish a single source of truth across your data. Generally speaking, you should make sure that your metrics, logs, traces, and event data are annotated with four basic tags: service, env (environment), team, and version. While the exact tags you use may depend on the needs of your organization, ensuring that these four are added to every piece of data gives you the reliable, organization-wide context you need to draw relationships between services, infrastructure, and teams.

You can use these relationships to establish clear lines of ownership and dependency. With tags, you can quickly identify which teams are responsible for which services, and which ones they don’t interact with at all. When combined with insights into usage activity, this knowledge makes it easier to determine where access to certain tools or data should be restricted. In Datadog, you can also use tags to define access boundaries, enabling you to control who can view sensitive data.

For smoother adoption of tagging policies, ensure that teams across your organization can easily tag their resources without disrupting their workflows. To help with this, Datadog provides several ways to assign tags, including directly within the platform UI or via the Datadog API.

Access and key management

One of the most critical functions of governance is managing data permissions. Tagging can help you control access on a granular level. To handle permissions more sustainably at scale, you’ll also need to define user roles. Many monitoring platforms include default roles that let you quickly apply common, system-wide permission sets—such as admin or read-only—to users across multiple teams. Additionally, Datadog supports custom roles to more precisely manage access so that you can tailor permissions for specific personas like auditors or support engineers.

A list of roles for an organzation.
A list of roles for an organzation.

Beyond controlling what data users can view and edit, you’ll want to manage how they access it. Datadog uses both API keys and application keys to authenticate users. To strengthen system security, you should reduce the risk of compromised keys being used to access your data. Datadog enables you to do this via several methods. First, you can scope your application key permissions to ensure each user has only the privileges necessary to do their job, reducing the impact of potential attacks. Second, you can use multiple Datadog API keys across your organization, making it easy to regularly rotate and revoke keys.

As more services, tools, and teams are added to your organization, you should periodically revisit your access policies to ensure that they continue to block unauthorized access without becoming overly restrictive. You’ll also want to configure alerts on system events that may indicate malicious activity. For example, a sudden spike in API key creation or usage could point to a compromised integration or an abused credential. Datadog Audit Trail can help you identify which events to monitor and correlate anomalous behavior with historical activity. If you do notice a suspicious event, Audit Trail enables you to investigate deeper with a record of recent user actions. This can help you determine whether sensitive data was accessed or critical configurations were changed.

Cost control

Governance also plays a key role in managing observability costs. By enriching your data with ownership information, governance helps you attribute usage to the specific teams, services, and resources contributing most to your overall expenses.

Datadog provides direct visibility into team-level and resource-level spend through features like Usage Attribution. Usage Attribution uses your tags to build detailed reports that break usage data down into customizable facets like team, app, and service.

Datadog usage for products such as APM and USM broken down by team.
Datadog usage for products such as APM and USM broken down by team.

Tagging can also make your cost insights more robust. For example, Datadog FinOps analysts found that breaking down our cloud spend by team helped them collaborate more deeply with engineers on storage optimization projects, with the resulting changes saving us roughly $1.5 million dollars annually.

Once you’ve identified areas where you can optimize observability spend, you can fine-tune data sampling rates for services and endpoints across your system. This lets you limit the amount of data you ingest from high-throughput resources, which can easily drive up costs and create unnecessary noise, while still capturing enough from lower-volume sources to effectively troubleshoot issues. Within Datadog, you can easily manage sampling for individual resources via log indexing, trace ingestion controls, and user session sampling. You can also adjust retention periods so that you only store representative samples of critical data.

Additionally, you can configure cost alerts to catch unexpected spend increases early. Datadog Cloud Cost Monitoring lets you calculate these alerts in several ways, including historical activity comparisons, exact thresholds, anomaly detection, and forecasts. As opposed to ingestion controls, these alerts are based on actual spend, making them more resilient to change in data volume or pricing.

A list of cost monitors, a few showing triggered alerts.
A list of cost monitors, a few showing triggered alerts.

Onboarding and organizational alignment

Creating governance policies is the first step, but enforcing them can present further challenges. Teams may adopt best practices unevenly, prioritizing those that are closest to their existing processes. Some teams may also struggle to understand which policies and procedures even apply to them in the first place.

To align your teams, you’ll want to give them a single resource that helps them understand what they’re responsible for. You can do this in Datadog via Teams, which enable you to easily group and filter assets—including dashboards, incidents, services, and monitors—based on ownership. Once you’ve established which resources your teams own, you can set up Scorecards to help them evaluate how well their configuration aligns with organizational best practices.

Scorecards broken down by area, including ownership and production readiness.
Scorecards broken down by area, including ownership and production readiness.

Finally, you’ll want to define policies for product adoption and project creation to ensure that any new activity follows your governance strategy. Datadog Workflow Automation can help teams integrate these policies into their work by providing standardized, easy-to-use flows for common tasks. These flows can include setting up or removing user accounts, scaffolding new projects via Golden Paths, and identifying discount program opportunities.

Unify governance activity with Datadog

As we’ve discussed, Datadog has a variety of features that can help you design and implement your governance strategy. However, using these features often means navigating between different areas of the platform while trying to ensure that all of your observability data is secured and monitored.

To help address this, we’ve created the Datadog Governance Console. From the console, you can view Datadog activity for every team in your organization. The console enables you to visualize tagging coverage for your services, customize cost and access policies, and view governance guides. Together, these views help you identify and respond quickly to compliance gaps. To visualize this activity at a high level, you can create governance policies that give you an overview of adoption for key best practices.

The Governance Console view, with organization-wide usage metrics displayed.
The Governance Console view, with organization-wide usage metrics displayed.

Let’s say you want to ensure that all of your data has team, env, and service tags. You can create a governance policy to help you track how much of your data includes these tags. Then, when you notice that your adoption rates for this policy are low, you can decide to revisit your tagging outreach strategy to improve internal alignment.

Manage observability data across your organization

Observability data is key for troubleshooting issues within your system, but it can also present security risks and drive up costs. By helping you develop a clear, consistent strategy for managing your data, observability governance makes it easier for your organization to scale its monitoring efforts safely, reliably, and cost effectively.

Datadog Governance Console is currently in Preview. To get started, you can sign up for the Governance Console program. Or, if you’re new to Datadog, .

Related Articles

How we use Datadog to further our FedRAMP® compliance

How we use Datadog to further our FedRAMP® compliance

Manage your infrastructure with ServiceNow CMDB and Datadog

Manage your infrastructure with ServiceNow CMDB and Datadog

Best practices for tagging your infrastructure and applications

Best practices for tagging your infrastructure and applications

Monitor critical Datadog assets and configurations with Audit Trail

Monitor critical Datadog assets and configurations with Audit Trail

Start monitoring your metrics in minutes