Error Tracking enables you to reason about errors at a higher level—and investigate more effectively—by automatically grouping application errors into issues. By tracking issues alongside individual error events, you can get the context you need for root cause analysis—and reduce your mean time to resolution. Error Tracking builds on the data you’re already monitoring with Datadog, so you can start using it with no additional setup.
From errors to issues
When Datadog first receives an error event from Real User Monitoring (RUM) or APM, it creates a new issue. It then uses the issue to group subsequent errors that have similar messages and stack traces. Condensing errors into a single issue helps you triage tasks, summarize problems for colleagues, and otherwise maintain a clearer understanding of the work ahead of you. Error Tracking can also apply metadata to an issue, such as when its errors occurred, giving you more context than if you were to investigate the errors separately. Datadog can also notify your team whenever it identifies a new issue, giving you confidence that your triage plans are up to date.
Error Tracking extracts error messages from RUM and APM data, so there’s no need to configure an SDK or modify your application code.
Know what to fix first—and how to fix it
Error Tracking enables you to get more context around any issue for smarter triaging and faster investigations. The Error Tracking Explorer view shows a list of issues that Datadog has detected, along with important aggregates like each issue’s total error count and frequency over time.
When you suspect that application errors are impacting downstream services or end users—for instance, you see a decline in certain user actions in the RUM Explorer—you can filter the Error Tracking Explorer by time range or facets such as service, environment, and application version to quickly identify a specific issue to investigate first. Each issue is labeled with a workflow state (open or ignored) to help your team members keep track of the status of issues. To help triage the most frequent errors with the broadest impact on your customers, you can also sort issues by the number of occurrences or the number of affected user sessions. In the following example, we can observe that while the transition error in our Android app is occurring most often, the error in our iOS app’s payment service is affecting the most customers.
If you click on an issue within the Error Tracking Explorer, you’ll see helpful metadata within an Issue Panel. A timeseries graph shows the frequency of errors within the issue, and the issue summary tells you which code version first threw the error. This data both indicates how serious the issue is and helps you correlate the issue’s occurrence with other events (such as a recent deployment).
When it comes time to investigate, you can quickly characterize the error by looking at the summary at the top of the panel—Datadog automatically parses all of the issue’s error messages for patterns in order to provide a consolidated description. The panel also surfaces the source code that caused the error, making it straightforward to find and revert the relevant git commit. Afterward, you can keep tabs on the Error Tracking Explorer to see if the original issue is still occurring—or if other issues have cropped up instead.
Backend errors in the foreground
By grouping errors into issues and showing where they arise in your application source code, Error Tracking can help you identify trends that may otherwise go unnoticed. In the example below, Error Tracking shows us that our Ruby on Rails application,
web-store, displayed a spike in
PaymentServiceUnavailableError messages earlier this week. Using the stack trace, we can see that the
ShoppingCartController#checkout method rendered the view that threw the exception.
We can then navigate from Error Tracking to view related errorful traces. The flame graph below makes it clear that the error in our payment service is being caused downstream by an exceeded rate limit in the service’s API.
Since our application should not be disrupting our users’ shopping experiences with error messages, we decide to investigate further, using Trace Search and Analytics to see if the spike in exceptions correlates with particular kinds of user requests.
More revealing stack traces
Error Tracking unminifies your code by using source maps, which indicate where segments of the minified code appear in the original source. Datadog makes it easy to upload source maps using the
datadog-ci binary, which we designed to run inside continuous integration environments. Run the
datadog-ci sourcemaps upload command to send the contents of your source map directory to Datadog automatically at build time. You’ll then be able to see unminified source code within the Issue Panel. Error Tracking also supports .dSYM parsing, so you can easily view human-readable stack traces for your mobile application code as well.
To err is human; to track is canine
Datadog Error Tracking gives you actionable insights into your application errors, making it easier to troubleshoot the issues that affect your users most. If you’re using RUM or APM, Error Tracking will start working right away. Error Tracking is just one way of getting comprehensive, code-level visibility into your applications. You can also set up Log Management, Synthetic Monitoring, and Continuous Profiler. And to get deep visibility into the resource utilization of your code, you can set up Datadog Profiling, which can run continuously—even in production. If you’re thinking about getting started with Datadog, sign up for a free trial.