Proactively Track, Triage, and Assign Issues With Datadog Case Management | Datadog

Proactively track, triage, and assign issues with Datadog Case Management

Author Cansu Berkem
Author Tanja Garcia
Author Addie Beach

Published: April 18, 2023

Complex systems require many different monitors to assess the health of their infrastructure and applications, creating a wealth of alerts that can be hard to track. Due to a lack of effective triage processes, many organizations page engineers for every alert that comes in, making it difficult to separate false positives from issues that actually require immediate attention. In an ideal system, on-call engineers would be paged only for major incidents, and issues that don’t have any urgent customer impact could be left to longer-term investigations. However, your system still needs procedures for handling these back-burner issues—they can easily fall through the cracks and, if left unaddressed, grow to cause bottlenecks or a cascade of impacts.

Datadog Case Management provides a centralized place to track, triage, and troubleshoot these types of issues. Across the Datadog platform, you can easily create cases from alerts, security signals, and error-tracking issues that you want to investigate. Then in a single view, you can track and assess all of your cases, helping you organize your troubleshooting efforts. You can easily assign cases to users or teams, establishing clear lines of ownership that persist throughout the lifespan of the case. You can even set up cases to be automatically created from issues using Datadog Workflows, making it easy for teams to triage and delegate effectively. Case Management also enables you to link graphs, logs, and other telemetry data from across Datadog with information from external tools, such as messaging and issue-tracking apps.

The Case Management overview page, with both performance and security issues displayed.

Prioritize and delegate all cases from within one view

With Datadog Case Management, you have centralized access to alerts, security signals, and error-tracking issues that haven’t yet escalated into customer-impacting incidents. From the Case Management overview page, you can view crucial context for each case, including associated environments, services, incidents, and teams. You can also quickly determine whether a case is already being worked on, helping you delineate ownership, identify points of contact, and avoid duplicating assignments. You’re even able to create cases for work items without any linked alerts or signals.

The case creation popup on the Case Management overview page.

The overview page enables you to sort your cases using filters—such as status, team, or priority—to find the issues you’re looking for (or discover ones you didn’t know existed). You can also create custom inboxes to organize related cases. For instance, you might want to group together every medium-priority issue tied to a specific business-critical service. These features help you organize your queue of ongoing alerts, enabling you to prevent customer impact.

One of the primary responsibilities of central support teams is to ensure that every issue is being handled by the appropriate team or engineer. In modern cloud environments, however, the sheer amount of telemetry data makes this an enormous task requiring constant vigilance, decision making, and context switching. Using Datadog Case Management, these teams can view all open cases, determine the appropriate assignees and priority level, and assign investigators without ever leaving the overview page.

Let’s say a case comes in for an alert showing high throughput on one of your services. The tags on the Case Management overview page help you determine which service is experiencing the issues and which team is responsible for managing it. You can then assign an engineer to troubleshoot the issue by selecting a team member from the drop-down menu. That designated investigator is added to the case overview so that other engineers and support team members can see that the issue is being handled.

Organize investigations using a single source of truth

Datadog Case Management streamlines issue investigation, enabling you to quickly resolve minor or intermediate issues before they grow in impact. For example, many security teams need a place to consolidate troubleshooting efforts that are unrelated to active threats but are still necessary for remedying vulnerabilities. They can create cases for these ongoing issues and work on them in between more pressing incidents. And if they decide that a case should be an incident after further investigation, they can escalate it directly within Case Management by declaring a Datadog incident or by using our one-click integration with third-party ticketing systems such as Jira.

Once established, a case becomes the central hub for all context and communication related to an issue. You can easily create or link Jira and ServiceNow tickets as well as Slack channels and conversations directly from the case. This gives you easy access to relevant information about your case investigation across platforms. You can also add relevant alerts and investigation notebooks to the case to consolidate resources from across the Datadog platform. Additionally, each case has an associated timeline that acts as a single source or truth, with timestamps for key events, activities, and comments.

Details for case in Case Management, showing the timeline, one-click integrations, and issue description.

These features help you organize your investigations into a single, easily accessible place, streamlining both your triaging and troubleshooting activities. For example, let’s say one of your monitors consistently shows a brief spike in latency on a specific database once a week. The issue is not actively impacting users or other parts of your system yet, so you decide to turn it into a case. From here, an engineer on your team can assign themselves to the case and start their investigation, using Case Management to gather any findings.

Start using Datadog Case Management today

Datadog analyzes metrics, traces, and logs to surface unusual behavior and concerning trends. Without a centralized location for processing and addressing these findings, key issues can go unnoticed. By using Datadog Case Management, you can organize your investigations around alerts, security signals, and error-tracking issues in the same platform you already use to troubleshoot. You’re able to easily pivot from your cases to observability data during investigations, and you can enrich your cases with context from Datadog.

You can access the Case Management overview page from the Service Management menu in Datadog—use our documentation to get started. Or, if you’re not yet a Datadog customer, you can sign up for a 14-day today.