Quickly Remediate Issues in Your Azure Applications With Datadog Workflow Automation | Datadog

Quickly remediate issues in your Azure applications with Datadog Workflow Automation

Author Syed Sarjeel Yusuf
Author Emma Chadwick

Published: January 3, 2024

Datadog Workflow Automation speeds up incident response and remediation for DevOps, SRE, and security teams by enabling them to automatically run predefined task sequences whenever specific alerts or security signals are triggered. After the feature’s initial release in 2023, Datadog is now excited to announce a significant expansion of its Workflow Automation capabilities with Azure actions, allowing engineers to create automated workflows for their Azure resources for the first time.

With nearly 80 Azure actions already available today and more on the way, you can now use automation to address disruptions, improve response times, and help boost the overall health and security of your Azure-based systems. These new actions use integrations to cover an array of Azure services, providing the means to orchestrate complex workflows across different technologies. These many different Azure integrations also act as categories for actions and automation capabilities in the Workflow Automation UI, and they include:

  • Azure DevOps Pipelines: Automate pipelines to immediately restart services in response to Datadog service outage alerts.
  • Azure VM Scale Sets: Dynamically adjust the number of VM instances based on traffic load, as indicated by Datadog monitors.
  • Azure Blob: Manage access control by automatically updating blob properties when Datadog Cloud Security Management detects misconfigurations.

This post will explore some of the processes in Azure that you can now automate with Datadog Workflow Automation. We will describe how to:

Automatically restart critical services where needed

With Workflow Automation, you can automate remediation in response to alerts from Datadog monitors.

Imagine your team is running an application on Azure, with Apache Tomcat deployed as the application server. This service is critical to your operations, handling user accounts and financial transactions. To maintain visibility into its performance, you use Datadog to configure metric monitors and anomaly monitors for the Tomcat service.

One day, a monitor alerts you to an issue with the server, indicating a noticeable increase in response times, a higher error rate, and spikes in CPU and memory usage. After researching the issue, you discover that your application is suffering from a memory leak, which is resulting in performance degradation and intermittent unresponsiveness. The ultimate solution is to fix the memory leak, but this undertaking requires some time and cannot be implemented immediately. A temporary solution is needed in the meantime.

In this situation, the next step is clear: you need to restart the Tomcat server to clear the memory and temporarily restore normal operation. And since this problem will likely recur until the memory leak is fixed, it’s important to mitigate impact by having the service restart automatically whenever it exhibits this type of performance issue.

Using an Azure DevOps Pipelines action in a workflow

To speed up the mitigation response, reduce sluggish performance, and minimize downtime for similar issues with your Tomcat servers, you can use Azure actions in Workflow Automation. As mentioned, Azure actions are grouped into distinct categories that correspond to different integrations. To restart Tomcat service automatically, we will draw upon the Azure DevOps Pipelines category of actions in Workflow Automation. Specifically, we will use the Run pipeline action in a workflow that restarts the Tomcat service.

The Azure DevOps Pipelines category of actions.

By adding the workflow as an @-mention to your monitor configuration, the workflow will automatically be triggered when the monitor goes into an alert state.

The workflow is shown below. It will first get all the necessary details about the Tomcat service from the Service Catalog and pass this information to the Run pipeline action.

A workflow to restart the Tomcat servers.

The workflow will then perform the step to run the pipeline, followed by the step to determine whether the monitor has resolved itself. In either case, a corresponding message is then sent to an appropriate Slack channel, informing the right response team.

By combining Datadog’s alerting capabilities with an Azure action in Workflow Automation, your response to disruptions is greatly accelerated. Additionally, the automated execution of tasks in the workflow helps ensure the reliability and availability of your application, reduce the need for manual intervention, and minimize downtime, all while maintaining clear communications surrounding the incident resolution.

Secure your Azure-based applications with the click of a button

Apart from triggering workflows from monitors, you can also trigger workflows either manually or automatically in response to security signals.

Imagine, for example, that you are part of the security team managing Microsoft Entra ID (formerly Azure Active Directory) for your organization and are responsible for user account and access security. One day, you receive a security signal from Datadog Cloud SIEM indicating a potential compromise of a user’s credentials.

This signal is triggered due to suspicious activity related to the user’s account. To respond effectively to this security incident, you have created a workflow that orchestrates a series of actions to contain a compromised user. All you have to do now is run the pre-created workflow, which you can do directly from the details page of the security signal.

The Run Workflow button in a security signal.

By clicking on the Run Workflow button, you will be able to select the correct workflow from the Workflow Library and execute the workflow to immediately contain the user.

Selecting and running a workflow manually.

In addition, you can also trigger your workflow automatically. To do so, you can add the workflow to the Notification Details section of a notification rule or—as shown below—directly to the notify field in the Set rule cases section of a Cloud SIEM detection rule.

Configuring a Cloud SIEM detection rule to trigger a workflow automatically

Get started with Workflow Automation and Azure actions

The workflows described above show just some of the tasks that can be automated for Microsoft Azure. However, with over 75 Azure-related and 550 total actions that can be combined in various ways, the range of different workflows that you can automate is vast.

Discover the full spectrum of available actions in the action catalog for Workflow Automation, or check out the many workflow blueprints available to kick-start your automation journey. The workflows mentioned in this blog are also available as blueprints, ready for you to pick up and configure. To learn more about how Datadog Workflow Automation can help you reduce MTTR and proactively troubleshoot issues, check out our documentation. And if you aren’t already a Datadog customer, get started with a 14-day .