Enhanced Azure Monitoring With Datadog | Datadog

Enhanced Azure monitoring with Datadog

Author Steve Harrington

Published: May 27, 2020

Microsoft Azure is a cloud computing platform for building, deploying, and managing global-scale applications. With a wide range of offerings, including dozens of different services, Azure provides tools for users to create large and sophisticated systems for hosting any type of workload. But with the huge number of configuration options and resource types, understanding the health and performance of your applications in Azure can be challenging.

That’s why we’re excited to announce that we have enhanced our Azure integration with new tags and metrics that will help users get clearer visibility than ever before into their Azure environment. In addition to all of the standard metrics we collect from Azure Monitor, Datadog now automatically queries other resource-specific Azure metadata APIs and uses that information to generate additional timeseries metrics. The result is that our existing integration now provides more than 40 Datadog-generated metrics and dozens of new tags for your Azure services, including:

  • App Services
  • Azure Functions
  • App Service Plans
  • Azure SQL Databases
  • Azure Load Balancers
  • Azure Virtual Networks
  • Usage and quotas
  • Resource counts and statuses

We have also added new out-of-the-box dashboards and revamped existing ones to include the new data. This gives you richer, more actionable insights into your entire Azure ecosystem with no additional configuration.

Understand your Azure App Services

Azure App Service is a powerful platform-as-a-service (PaaS) that enables you to deploy web applications and serverless functions using predefined compute resources. But it can be challenging to understand how your applications, functions, and underlying infrastructure interact—both conceptually and technically. We’ve added new metadata tags and updated our preset Azure App Services dashboard to help you more easily understand how your various types of web apps, functions, and App Service plans fit together.

Web apps and functions are now tagged with the name of the App Service plan they run on. This means you can easily visualize the health and performance of your applications alongside their hosting environment. This is important because when something goes wrong with your web apps, it can be useful to see if the issue is correlated with CPU saturation or other performance degradation in the underlying infrastructure.

Sometimes fixing issues is as simple as adjusting the autoscaling policy of your App Service plan or upgrading to a new tier. Datadog now tags App Service plan metrics with their tier (e.g., Basic, Standard, Premium, etc.), and provides new metrics that track the current and maximum number of hosts within each plan. This enables you to track the scaling inside your App Service plans and easily see if you’re bumping up against resource constraints. With these enhancements, you’ll have the information you need to identify potential resource bottlenecks in your App Service applications and take corrective actions quickly.

We’ve added new tags to existing Azure SQL Database metrics that help identify important information like your database’s role, plan type, and maximum size. Customers have also asked for a way to get visibility into the state of geo-replication links. Geo-replication links are the connections between primary databases and their backups. If these replication links are in an unhealthy state, you could be at risk of data loss.

Previously, checking a database’s geo-replication links required a special trip to the Azure portal to view the current state—there was no way to see history or to easily monitor the state in an automated way since there is no metric for this from Azure Monitor. Now, our integration queries for these link states regularly from the Azure SQL Database API and generates a new metric that makes it easy to create dashboards and alerts in Datadog. Get started exploring this new metric quickly using an updated preset dashboard for Azure SQL Database.

Azure Load Balancers

Datadog collects metrics from Azure Monitor that provide valuable information on the performance and health of your Azure Load Balancers. However, Azure Monitor does not directly provide metrics for the number of VMs in the backend pools that are serving the incoming requests. This can be a critical gap in visibility when investigating performance issues with your load balancers. We’ve created a new Datadog-derived metric for Azure Load Balancer that counts the number of hosts in your load balancer’s backend pool. This enables you to view the scaling of your backend VMs alongside information about the load balancer that’s forwarding traffic, which can help with optimizing performance and troubleshooting.

We’ve put this data together in a new preset dashboard that helps you to visualize and understand these metrics.

Azure resource limits and quotas

Many Azure services have built-in limits and quotas. Some are applied by Azure policy in order to protect against fraud or abuse. For example, by default Azure limits the number of vCPU cores for each subscription to 20 per region. Other limits are natural limits, like the size of a virtual network being limited by its CIDR block.

Regardless of where they come from, keeping track of your Azure environment’s resource usage compared to these quotas and limits is critical to avoiding provisioning failures. Our integration now collects data from Azure Usage APIs for tracking consumption and limits for Azure storage, network, and compute resources in your stack. You can use these new metrics and their associated tags to easily create alerts that notify you if, for example, you’re approaching a limit on the number of storage accounts or public IP addresses within a region. These kinds of policy limits can be tracked using the new Azure Usage and Quotas preset dashboard.

You can also track the allocation and availability of Azure Virtual Network address space using our new Azure Virtual Networks preset dashboard. These new metrics and dashboards can help you avoid bumping up against resource limits before you’re blocked unexpectedly.

Resource counts

Resource counts aim to answer some of the most basic questions for monitoring your Azure resources: how many do I have and what’s their status? However, this information can be frustrating to visualize or build alerts for because there are no standard metrics that directly track this.

To address this problem, we’ve created new metrics for each Azure resource type. These metrics are available for dozens of Azure resource types in the form azure.*.count. For example, use the sum of azure.vm.count to track how many VMs you have. Want to see their status? Just group the same metric by the status tag.

With the new .count metrics, you can easily track, visualize, and alert on status changes of your Azure resources. We expect this new metric to be a valuable tool for building dashboards that intuitively convey the health and composition of your Azure environment.

Get started today

These enhancements to our Azure integration are available now, so Datadog customers can get started immediately. See our docs for more information. If you’re not yet using Datadog, start a to get comprehensive insights into the performance and health of your Azure infrastructure.