Automate End-to-End Processes and Quickly Respond to Events With Datadog Workflows | Datadog

Automate end-to-end processes and quickly respond to events with Datadog Workflows

Author Jordan Obey

Published: October 19, 2022

Developer, SRE, IT, and security teams tend to rely on complex manual procedures to respond to disruptions and other unforeseen changes that arise in their systems. These manual responses typically involve multiple steps with context-switching, and they often require significant expertise. Incident responses are also subject to the constraints of other processes and procedures that are already in place. All of these factors combine to prolong incident response time while also making a team’s incident handling highly susceptible to errors and misconfigurations.

Now, teams can combine monitoring and workflow automation into a single, streamlined solution with Datadog Workflows. Datadog Workflows automate and orchestrate complex flows of tasks and enable teams to incorporate human input into those flows where needed.

Datadog Workflows are triggered in response to alerts and security signals provided by Datadog, and they can also be scheduled or run manually from dashboards. For example, you can configure Datadog Workflows to trigger on a specific alert and automatically execute processes such as performing a rollback to the last stable code revision, building investigative notebooks, scaling your infrastructure, blocking IP addresses, and more. You can also schedule Datadog Workflows to regularly check unused Datadog dashboards or Amazon EC2 key pairs. Similarly, you can manually trigger Datadog Workflows to open or close feature flags for respective accounts.

In this post, we’ll look at how Datadog Workflows help teams save time and confidently manage the health of their systems by automatically executing tasks in response to specific alerts, events, and threats.

Automatically run workflows in response to Datadog alerts

Whether you’re monitoring application performance, network health, or infrastructure resources, alerts are critical for letting you know when issues occur so that you can respond accordingly. With Datadog Workflows, instead of responding to alerts manually—which can be repetitive and time consuming—you can use a simple UI to create a workflow consisting of connected actions that execute when an alert is triggered, significantly reducing your MTTR.

Let’s say you’re running a serverless application and want to automatically redeploy to a stable revision of a Lambda function in the event that it starts experiencing a high volume of errors. First, you can either create a new workflow manually or get started with a Blueprint workflow, which provides you with an out-of-the-box selection of actions required in a preset manner. For example, you can select the “Perform Deployment with AWS CodeDeploy” Blueprint to quickly create a workflow that includes steps for deploying a Lambda function revision. You can also easily customize Blueprints to fit your specific needs by adding further steps and logical operators from the actions catalog.

blueprint.png

In addition to hundreds of available Datadog-specific actions, such as creating dashboards and querying logs and metrics, Datadog Workflows include actions that are available through integrations such as AWS, Cloudflare, Jira, Github, and more. These actions can be incorporated into a workflow and configured to respond to human input. For example, before an action is executed, you can configure your workflow to require user confirmation via a third-party service. You can add a Slack action that notifies an on-call engineer whenever a workflow is triggered to deploy a revised Lambda function. You can also add data operator actions to build the revision path or add AWS CodeDeploy and Datadog Monitor status check actions to track the progress of the remediation. These proactive steps help keep your team aware of triggered workflows and provide control and visibility over what gets executed—without requiring you to navigate different pages within the Datadog platform. They also reduce errors and makes the overall remediation process more efficient, significantly shortening MTTR

actions_catalogue.png

Datadog Workflows also provide generic actions, such as HTTP calls to specified endpoints (as well as the data operator actions already mentioned), which allow you to write scripts to process data as it flows through a workflow.

Once the workflow runs, Datadog marks actions with green checks as they are executed so that teams can quickly check on the status within a single, unified view and ensure the expected tasks are performed as required. Each workflow can also be edited and exported as JSON, which is easy to parse—and easy to later share and use for configuring other similar workflows.

Once you create a workflow, you can add it as an “@mention” to the alert that’s monitoring Lambda function errors to ensure that the workflow runs whenever that alert is triggered.

workflows_config.png

Proactively respond to Datadog Security Signals

Datadog Workflows also help you quickly counter security threats by enabling you to trigger a workflow in response to a Datadog Security Signal. You can add a workflow “@mention” to the configuration of a Datadog detection rule notification so that when the rule is triggered and emits a Security Signal, a workflow will execute in response. For example, let’s say your organization uses Okta for identity and access management and has a rule in place that detects when a user tries to access an app without authorization. With Datadog Workflows, a “Suspend Suspicious Okta User” Blueprint workflow is available to configure and eventually add to that rule so that when it is triggered, the suspicious user will be automatically suspended. To avoid suspending trusted users by mistake, the “Suspend Suspicious Okta User” workflow also includes an action that will notify you about the suspicious activity via Slack so that you can confirm that the user should be suspended.

workflow_example.png

Along with boosting productivity and saving valuable time, implementing automated workflows in response to security threats allows you and your team to more quickly and easily defend against attacks.

Get started today

Datadog Workflows streamline your monitoring and troubleshooting by automating end-to-end processes and executing actions in response to alerts, security threats, and other insights. The feature is currently available as private beta, and you can sign up for early access here. To learn more about how Datadog Workflows can help you reduce MTTR and proactively troubleshoot issues, check out our documentation.

If you aren’t already a Datadog customer, get started with a 14-day .