Build a Modern Data Compliance Strategy With Datadog's Sensitive Data Scanner | Datadog

Build a modern data compliance strategy with Datadog's Sensitive Data Scanner

Author Yoann Robin
Author Mallory Mooney

Published: November 3, 2021

Within distributed applications, data moves across many loosely connected endpoints, microservices, and teams, making it difficult to know when services are storing—or inadvertently leaking—sensitive data. This is especially true for governance, risk management, and compliance (GRC) or other security teams working for enterprises in highly regulated industries, such as healthcare, banking, insurance, and financial services. Though these teams are responsible for enforcing strict compliance and security requirements and frameworks (e.g., GDPR, CCPA, PCI) across their organizations, they are often too small to efficiently monitor new development and may not have visibility into—or control over—what information application services are logging.

To address these pain points, GRC teams need a strategy for discovering, classifying, and protecting sensitive information that can scale with the volume of their data and complexity of their architecture. The Sensitive Data Scanner accomplishes this by providing real-time visibility and control over the data that your application services are logging, so you can:

  • automatically identify sensitive data in all your logs
  • classify sensitive data as high or low risk via searchable tags
  • protect sensitive data by either scrubbing it or hashing it for correlation or auditing purposes

The Sensitive Data Scanner eliminates data exposure blind spots and enables organizations to remain compliant and keep data safe. Using the Sensitive Data Scanner along with Datadog’s Log Management’s RBAC controls, audit logs, and flexible data retention, GRC teams can build a modern data compliance strategy, regardless of where the data originated from.

Create scanner groups to identify sensitive data in incoming logs.

Customizable scanners for detecting potential data leaks

Sensitive Data Scanners are groups of rules that scan logs from specified application services and look for specific data patterns. Information security and compliance teams can create a new scanning group to capture logs from only the services they want to monitor, then define a set of group rules to scan those logs for any confidential information. This data can include credit card numbers, social security numbers, email addresses, API tokens, and more.

Datadog includes a library of pre-configured rules that teams can add to a scanner group to look for common compliance violations without any extra configuration. For example, they can leverage Datadog’s credit card rules to automatically detect when a service is logging credit card numbers from any major provider (e.g., Visa, Discover, American Express). Datadog uses industry standard techniques, such as the Luhn algorithm, to detect valid credit card patterns and reduce false positive matches.

Create custom scanning rules to automatically obfuscate sensitive data.

Datadog automatically tags any incoming logs that match a scanner rule (e.g., sensitive_data:credit_card), enabling teams to easily classify data as high or low risk and search for the services that may violate security and compliance standards. Tags complement your existing RBAC policies—you can use tags with Datadog Log Management’s query-based RBAC model to grant or deny access to logs containing sensitive data.

GRC teams can also leverage these tags to route flagged logs to specific indexes and adjust their retention settings to limit how long data is exposed. And with the Audit Logs view, teams can understand which users may have accessed the data or modified configurations related to the time periods when sensitive data was exposed.

Automatically obfuscate sensitive information in logs

Datadog also provides the option to either hash or scrub the relevant sensitive data to ensure customer privacy. Hashing log data replaces every unique value that Datadog detects with a generated non-reversible, unique token. This enables GRC teams to obfuscate sensitive information, such as a customer’s user or account ID, while still retaining their uniqueness for cardinality analytics and auditing purposes. Scrubbing enables you to replace data matches with a user-defined string, and can be useful when teams do not need to retain any uniqueness.

Logs are automatically flagged with custom scanner tags
Automatically scrub sensitive information from incoming logs, such as credit card numbers.

Monitor flagged services with alerts and dashboards

Applications in distributed environments can generate a large volume of logs across several different services, which makes it more difficult to know which service is logging sensitive data. Using the tags configured in data scanners, GRC teams can create alerts to notify them as soon as a service begins logging sensitive data, enabling them to respond to an issue faster. For example, teams can create an alert that uses a sensitive_data:credit_card tag to notify them when credit card information is logged.

Create custom alerts based on sensitive data tags.

GRC teams can also visualize flagged logs to track the status of all of an organization’s application services and see which ones are violating compliance standards at a high level.

View the top services that are logging sensitive data.

When they notice a violation, GRC teams can quickly restrict access to flagged services by updating RBAC or other data access policies and determine the next course of action. For example, they may need to monitor user access to restricted services to make sure that only users with the appropriate permissions can view the data, or scrub sensitive data from affected logs for additional privacy. These actions help ensure that sensitive data that is logged by a service is quickly redacted and not shared by a user who should not have access to it.

Protect your customers from data breaches

The Sensitive Data Scanner enables GRC teams to leverage the Datadog platform for discovering, classifying, and securing sensitive information. This gives them full visibility into their application services and enables them to automatically control more of what is being logged, so they can ensure that customer data remains confidential. In the near future, we’ll be enhancing the scanner with secondary validation on matched events in addition to expanding pattern detection on APM, RUM, and Events data. You can check out our documentation to learn more about the data scanner, or sign up for a today.