Best Practices for Data Security in Cloud-Native Infrastructure | Datadog

Best practices for data security in cloud-native infrastructure

Author Mallory Mooney

Published: February 6, 2023

Editor’s note: This is Part 4 of a five-part cloud security series that covers protecting an organization’s network perimeter, endpoints, application code, sensitive data, and service and user accounts from threats.

So far in this series, we’ve looked at the importance of securing an organization’s network, application components, and the endpoints that support them. In this post, we’ll look at the following best practices for protecting the different types of data that flow through these components:

But first, we’ll look at how modern applications process and manage data, as well as common weaknesses and security threats to these systems. Understanding these areas sets the foundation for implementing effective security measures against data breaches and exposure.

A primer on application data

Cloud-native applications are made up of three layers of infrastructure, which we’ll refer to as the data plane, control plane, and management plane in this post. Once elements of traditional computer networking, these layers have evolved to become integral parts of cloud architectures. As seen in the following diagram, each layer has a specific function in a cloud application’s operation:

The three layers of infrastructure for data security

The data plane processes and manages all application traffic and data, which can exist in one of the following states:

  • At rest: not accessed or used by an application service or resource
  • In transit: transported from one destination to another
  • In use: read, updated, or deleted by an application service

Application components—such as databases, storage buckets, pods, compute instances, and serverless functions—operate within the data plane and are responsible for interacting with one or more data states at any given time. For example, databases and storage buckets store application data, so they handle data at rest. An application’s serverless functions, on the other hand, may be responsible for handling data in transit, which can include workflows like processing a user’s credit card information in order to finalize a transaction.

The control plane enforces rules for the data plane, monitors and responds to data plane events, and manages infrastructure operations like scaling and scheduling resources. Components like Kubernetes controllers and workload schedulers operate in this plane and perform tasks like applying a new role-based access control (RBAC) to a group of compute instances. Finally, management planes abstract away resource-level configurations by providing organizations with high-level control over data access, in addition to monitoring data activity. For cloud applications, management planes can be interfaces—such as AWS Management consoles, APIs, or command line interfaces (CLIs)—that enable organizations to create credentials, permissions, and monitoring workflows for their cloud resources globally.

The importance of data security for cloud-native applications

A cloud application’s multi-layered architecture is complex by design, so organizations may not always have complete visibility into how each of their application’s resources process, store, and control data. As a result, they may overlook vulnerabilities that could lead to data breaches or exposure. For example, malicious actors can steal sensitive data from a storage bucket if it’s publicly accessible. Additionally, application services may inadvertently include sensitive data within their activity logs, which is a common misconfiguration that malicious actors can take advantage of. Vulnerabilities like these are primary examples of how human error can lead to data breaches.

Mitigating these kinds of threats involves implementing measures across a cloud application’s data, control, and management planes, though the ownership of these activities varies. As part of the shared responsibility model, which determines who is responsible for managing certain abstraction levels of cloud infrastructure, cloud providers protect the control plane from security threats and ensure that it functions as expected. Organizations, on the other hand, manage the data plane by ensuring that the resources that directly interact with and manipulate application data are configured appropriately. In some cases, organizations also need to create data access controls and the appropriate monitoring solutions within the management plane.

These responsibilities place the data and management planes at the heart of an organization’s data security strategy, and some steps can be taken to ensure that these planes are configured to handle, access, and monitor data securely. In this post, we’ll first discuss a critical best practice for securing the data plane: data encryption. Encryption ensures that a cloud application’s services and resources—the primary operators within the data plane—are transporting, accessing, and storing data securely.

Later on, we’ll look at how to use the management plane’s global configurations to enhance data security via secrets management and authorization and access controls, as well as to provide adequate visibility into data activity.

Encrypt sensitive data at rest and in transit

As previously mentioned, application data exists in one of three states: at rest, in transit, and in use. It’s important for organizations to understand when their data enters into one of these states so they can determine the best course of action for protecting it. To accomplish this, they can start by inventorying the data that their applications process and then grouping it into meaningful categories. These steps give organizations better visibility into the different types of data their applications manage, which allows them to track when sensitive data—such as credit card numbers or application tokens—enters into a specific state. For example, data in transit is particularly vulnerable to threats like eavesdropping attacks, which involve operations like intercepting or manipulating user data as it is transferred across an unsecured network. In addition, cloud applications store a significant amount of valuable information, making data at rest another common target for malicious actors.

Organizations can protect their data in transit and at rest via encryption. Data encryption is the process of converting data from a readable format, such as plain text, into one that cannot be read without a special key. There are two primary encryption methods: symmetric and asymmetric. Symmetric encryption leverages algorithms like the Advanced Encryption Standard (AES) to provide a single key for encrypting and decrypting data. Alternatively, asymmetric encryption uses two different but related keys to encrypt and decrypt data.

Symmetric and asymmetric encryption each have unique benefits, so organizations can use them for different purposes. Symmetric encryption is a good choice for securing data at rest because it is fast enough to process large sets of data. This benefit enables organizations to use Full Disk Encryption (FDE) and File Level Encryption (FLE), which are solutions that offer multiple layers of security for stored data. FDE secures all data on a hard drive, whereas FLE secures individual files or directories, including those that are in use. Cloud providers offer disk-level encryption for their storage services, such as Amazon Elastic Block Store (Amazon EBS) volumes, Google Cloud Storage, and Azure Storage Service. In these cases, data is encrypted on the server side—after the provider receives data but before it is written to disk. Providers also offer client-side libraries that enable organizations to encrypt data at the file level before it is uploaded to cloud storage. This gives organizations additional control over the encryption process and generated keys, but it’s important to keep in mind that server-side encryption is typically recommended over client-side encryption.

Symmetric encryption can also be used together with the key pairs that are generated as part of the asymmetric encryption process in order to secure data in transit. These methods enable organizations to leverage the TLS protocol to encrypt data transmitted between clients and servers, which is now a standard practice for protecting web traffic.

Next, we’ll look at how to use the management plane to protect application credentials—such as API keys, passwords, and the keys that are generated as part of the data encryption process—from security threats.

Protect passwords, tokens, and keys with secrets managers

Organizations leverage digital authentication credentials, such as passwords and encryption keys, to securely access various parts of their systems within the data plane. These credentials—referred to as secrets—are used with a broad range of application components, ranging from application-level accounts and service resources to parts of an organization’s development infrastructure, such as their CI/CD pipelines. Secrets can increase, both in number and scope, at a pace that organizations are unable to handle efficiently. These factors make secrets management a key but difficult part of data security without the appropriate processes and tools.

According to a recent report, leaked secrets are one of the primary causes of data breaches. Cloud environments that are scaling and deploying new features at a rapid pace can easily leak secrets despite the best efforts of DevOps teams to keep them secure. For example, an engineer may temporarily hard code an API key into their code for testing purposes but forget to remove it before pushing their changes to a repository. In this scenario, anyone with access to the repository could see the engineer’s API key, giving them access to the API endpoint as a result.

To guard against these kinds of vulnerabilities, it’s important to apply management controls across a cloud application’s data plane, which includes the resources that store and transmit data, and the systems that interact with it, such as CI/CD pipelines. But in order to manage secrets effectively, organizations should build controls that include the following mechanisms:

  • Create: securely distribute and store newly created credentials
  • Expire: set expiration dates for credentials for application-level accounts
  • Revoke: revoke credentials that are compromised or no longer needed
  • Rotate: change or reset credentials based on a schedule or their expiration dates

As an example, standards like PCI DSS require organizations to rotate both their symmetric and asymmetric encryption keys on a regular basis, such as once a year at minimum. This practice reduces the possibility of a malicious actor leveraging a valid key to compromise systems.

There are several secret management tools available to help organizations build controls around when they create, expire, revoke, or rotate their secrets. AWS Secrets Manager, Google Cloud Secrets Manager, and Azure Key Vault, for example, enable organizations to manage secrets within their dedicated cloud environments. Tools like HashiCorp Vault allow organizations to securely create, store, and distribute secrets across their multi-cloud environments. For containerized environments, the External Secrets Operator and Secrets Store CSI Driver are also popular options for integrating with cloud-based secret management systems. With these tools, organizations can centralize secrets management for all of the services, resources, and application-level accounts that operate within the data plane.

So far, we’ve looked at best practices for securing key application components that are responsible for transmitting and storing sensitive data within the data plane. Next, we’ll look at how to use the management plane to configure global authorization and access controls.

Control data access with the principle of least privilege

As previously mentioned, the management plane abstracts away the complexities of configuring individual resources by providing controls at a global level. This capability also simplifies the process for limiting access to an application’s data and the resources that interact with it, which ensures that only authorized users and services can access secrets and other types of sensitive data.

To get started, organizations should create levels of access based on the principle of least privilege, which recommends that users or services should only have access to the data or resources necessary to perform specific tasks. Organizations can take this approach a step further by enforcing just-in-time access to ensure that users can only use certain permissions or resources within a specific timeframe. Temporary access can be applied to high-risk permission sets and resources, such as elevated privileges that allow users to change configurations for a sensitive data store.

Organizations can apply these principles for their cloud applications by considering the following steps:

  • Create accounts with the least level of privileges by default and add more only when needed
  • Separate permissions by operation, role, and group—referred to as the separation of duties principle
  • Limit the number of accounts with elevated privileges
  • Conduct routine audits to ensure accounts have the right level of privileges

These steps ensure that no one user or service account has more permissions than they need, which makes it more difficult for malicious actors to find and take advantage of an overly permissive account. Controlling data access with multi-layered permissions also provides better visibility into who is accessing it, which we’ll look at next.

Enable audit logs to monitor data activity

Data encryption, secrets managers, and access controls are all important parts of data security, but these tools are not always enough to protect sensitive information from leaks or exposure. Monitoring an application’s data activity—any resource, service, or user accessing sensitive data—also plays a role. In order to monitor data-related events, organizations can leverage logging, which is considered crucial for making applications less vulnerable to security threats.

One of the primary roles of the management plan is to provide organizations with centralized user interfaces and CLIs that not only allow them to control data access but also monitor data activity. Organizations can leverage their management plane tools like AWS Management console, Azure Portal, and Google Cloud console to enable logging for the critical sources that interact with their data. In this section, we’ll focus on audit logs.

Audit logs give organizations visibility into who is accessing their data, how they are accessing it, and when they accessed it. These questions are key to catching suspicious activity within their cloud environment, in addition to detecting leaks. Each cloud provider offers multiple types of audit logs for monitoring activity, as seen in the following table:

Cloud ProviderAudit log type
AWSManagement events, data events, Insight events
Google CloudAdmin activity, system event, data access
AzureAzure Active Directory reports, activity logs, resource logs

Regardless of their source, there are some key logs that organizations can monitor in order to identify suspicious activity or potential data leaks. For example, monitoring activity like failed login attempts can detect malicious actors attempting to compromise a service or user account in order to gain further access to cloud resources. This kind of activity can look like a series of failed login attempts followed by a successful attempt, which indicates that the actor successfully gained access.

Audit logs provide critical insight into a cloud application’s data activity, but organizations can enhance their monitoring and security capabilities by forwarding logs to a dedicated monitoring solution like Datadog.

Enhance visibility into data activity with Datadog

Forward logs to Datadog Log Management

Datadog complements an organization’s existing data monitoring and security strategy by collecting audit logs from across all cloud providers and automatically scrubbing sensitive data from them with tools like Datadog Sensitive Data Scanner. These capabilities provide organizations with a single place to track and analyze data activity as well as create an additional layer of security for the data flowing in their cloud environments.

Datadog takes this visibility a step further with Datadog Cloud SIEM, which automatically monitors incoming logs and generates security signals for identified threats. For example, Datadog provides out-of-the-box rules for detecting common threats to application data, including the following events:

In the following signal example, Datadog found a compromised AWS IAM User Access Key by analyzing data within Amazon CloudTrail audit logs with a built-in algorithm that detects impossible travel:

Impossible travel Security Signal

In this scenario, the key was used in a short period of time but in two geographical locations that were physically impossible to travel to within that same timeframe. This activity indicates that a malicious actor compromised a cloud account and is attempting to use it to gain access to other parts of a system.

Effective data security measures for cloud-native applications

In this post, we looked at best practices for securing the data and management planes for cloud applications. Steps such as encrypting data at rest and in transit, deploying secrets managers, and creating global access and authorization controls for resources can help organizations build an effective data security strategy. And with Datadog, organizations can monitor their environments to ensure that their data security measures are working as expected. Check out our documentation to learn more about Datadog’s security offerings. If you don’t already have a Datadog account, you can sign up a today.