Discover, Triage, and Remediate Sensitive Data Issues at Scale With Sensitive Data Scanner | Datadog

Discover, triage, and remediate sensitive data issues at scale with Sensitive Data Scanner

Author Tori Teng
Author Aaron Kaplan

Published: November 27, 2023

Managing sensitive information in your telemetry data poses many challenges to governance, risk management, and compliance (GRC) teams and overall security. Organizations in healthcare, finance, insurance, and other fields must carefully adhere to strict compliance requirements. But sensitive data comes in many forms and moves between many endpoints, and as a result, it can easily become exposed in telemetry data. What’s more, as organizations scale, troubleshooting and triaging sensitive data issues becomes an increasingly complex process, and the risks of alert fatigue, mismanagement, and oversights grow higher.

Datadog’s Sensitive Data Scanner helps you eliminate data exposure blind spots in order to ensure that you meet data compliance standards and regulations. You can use Sensitive Data Scanner to automatically identify sensitive data in your logs, APM traces, and real user monitoring (RUM) events; define high- and low-risk sensitive information using searchable tags; and redact or hash that information as needed.

In this post, we’ll show you how Sensitive Data Scanner provides fast, comprehensive visibility into potential data compliance issues through a focused central interface and context-aware classification. We’ll guide you through using Sensitive Data Scanner to:

Comprehensively manage data compliance issues

As sensitive data issues arise, GRC teams are under pressure to act fast in order to plug leaks and contain the fallout. Capturing detail along the way is also an urgent necessity, especially when it comes to providing auditors with incident reports. As these teams investigate sensitive data issues, they are confronted with a range of questions:

  • Which sensitive data has been exposed, and where has it come from?
  • Where, among many services, hosts, and environments, has the leak occurred? Which teams need to plug it, and how should they do so?
  • What caused the issue in the first place? What new security measures are called for?
  • How should this issue be prioritized alongside others?

Sensitive Data Scanner enables you to discover, triage, troubleshoot, and track data compliance issues through a central interface. The Sensitive Data Scanner summary page provides a high-level snapshot of every issue detected by your scanning rules, which define any data that might compromise the compliance, security, or privacy of your organization or users. Datadog provides a library of predefined rules that help you detect the exposure of data such as credit card numbers, email addresses, IP addresses, API keys, and more. You can also define your own regex-based scanning rules to identify business-specific sensitive information.

Datadog’s library of predefined rules for Sensitive Data Scanner.

To fine-tune Sensitive Data Scanner, you can define a keyword dictionary for each of your scanning rules. Keyword dictionaries can improve the accuracy of data classification and minimize the potential for false positives. For example, defining the keywords visa, credit, and card for a Visa credit card number scanning rule will ensure that any matches are preceded by these words within a range of 30 characters.

Keyword dictionaries help you fine-tune your scanning rules.

You can also define parameters for the rule’s target and actions that will be taken when a match is found, including setting a priority level.

You can set a priority level for when a rule finds a match.

When a scanning rule detects a match one or more times within any of the data sets you are scanning—called scanning groups—Sensitive Data Scanner designates it as an issue. It provides an overview of the sensitive data exposed in each issue and enables you to quickly create cases and Jira tickets, declare incidents, and collaborate on remediation.

Analysts can use Sensitive Data Scanner to access key information on each sensitive data issue, enabling them to start remediating without spending time gathering basic information, such as when a leak started or the number of events in which a specific set of sensitive information was exposed.

At the top of the summary page, you can find a tally of all sensitive data issues within the selected time frame. Issues are broken down by telemetry type, such as logs or traces, and priority level, which is defined as Low, Medium, High, or Critical for each issue according to your scanning rules.

The Sensitive Data Scanner summary page.

You can also quickly review all of the scanning rules you have enabled, as well as all of the cases associated with sensitive data issues in Datadog Case Management. Case Management enables you to track, triage, and troubleshoot issues like sensitive data leaks, assign troubleshooting and remediation to users or teams, and associate cases with Jira tickets.

The Issues Overview provides a detailed snapshot of each sensitive data issue identified by your scanning rules, sorted by priority level.

The Issues Overview provides a detailed snapshot of each sensitive data issue identified by your scanning rules, sorted by priority level.

The overview includes the following key information on each issue:

  • The specific scanning rule that has detected matches, so that you can easily determine which rules to modify as needed
  • The scanning groups—user-defined groups that specify relevant services, hosts, environments, or other classifying data—in which the issue has occurred, so that you can easily determine the blast radius of any leaks
  • The number of events associated with the issue, helping you quickly gauge its scope and severity
  • A trendline of these events and an index of when the most recent one occurred, allowing you to pinpoint when an issue started and get a quick picture of its development

You can select any issue from the summary page to open an expanded view. Here, you can find a timeseries graph of the leak, as well as a list of the sensitive-data events from Datadog Log Management, APM, RUM, and Event Management, which you can spot-check to quickly pinpoint where and how the sensitive information is being exposed.

From the summary page, you can access a side panel for each issue from which you can jumpstart your investigation and response.

Below that, you can assess the blast radius of each issue with the help of a breakdown of the services, hosts, and environments in which the data was exposed, as well as a list of the users who may have accessed it (via an integration with Audit Trail). Sensitive Data Scanner’s close integration with Datadog Service Catalog also helps you quickly determine which teams own any services involved in a leak, so you can resolve issues faster.

Quickly assess the blast radius of each issue and determine who to contact to plug leaks.

You can also pivot from the expanded view to Datadog Log Management, APM, RUM, or Event Management for a more detailed analysis of specific events, so that you can better identify patterns. Or you can pivot to Audit Trail to see related events involving Sensitive Data Scanner configuration changes or user queries with sensitive data tags.

Overall, these expanded views serve as strong starting points for troubleshooting data compliance issues and investigating potential leaks. And by identifying all affected services, environments, and hosts—as well as the teams responsible for them—in each expanded view, the Sensitive Data Scanner summary page enables you to quickly delegate and track remediation.

Kickstart investigations into potential data leaks

With its close integrations with Case Management and Incident Management, Sensitive Data Scanner helps you quickly start a coordinated response to any sensitive data issue. You can easily create cases or incidents directly from the summary page, create Jira tickets linked to issues in Case Management, delegate responders, and automatically point those responders to the relevant data.

Sensitive Data Scanner is closely integrated with Case Management, so you can easily manage a coordinated response to all of your sensitive data issues.

Case Management enables you to create a record of each issue, its causes, and the remediation actions that have been taken to resolve it. Along with Audit Trail’s ability to capture audit events of user queries of sensitive data, this can be particularly vital as analysts work with auditors and write postmortems.

Let’s say you’re a security analyst and the Sensitive Data Scanner detects JSON web tokens (JWTs) in your organization’s logs, APM traces, or RUM events. Since you have configured alerting based on the tags associated with your Sensitive Data Scanner rules, you immediately receive an alert from Datadog. From there, you can navigate to the Sensitive Data Scanner summary page to quickly gauge the blast radius of the leak and determine the owners of the affected services. ​​Based on your findings, you can create a new case for each affected service—or, if the issue seems like it might have a significant impact on your end users, declare an incident—and delegate remediation tasks as needed.

Scale your data compliance posture

The Sensitive Data Scanner summary page enables you to effectively manage data compliance issues at scale. ​​You can use it to discover and manage sensitive data issues in your logs, APM traces, and RUM events, conduct triage and troubleshooting, and assign and track remediation tasks.

Datadog users can start managing sensitive data issues via the Sensitive Data Scanner summary page today. To learn more about Sensitive Data Scanner, you can find more information on our blog or check out our documentation. If you’re new to Datadog, you can sign up for a 14-day .