Cloud-based Kubernetes applications have become the standard for modernizing workloads, but their multi-layered design can easily create numerous entry points for unauthorized activity. To protect your applications from these threats, you need security controls at each layer of your Kubernetes infrastructure. This approach to application security is an example of a defense-in-depth strategy, which helps teams increase their overall security posture and reduce single points of failure that could lead to a data breach.
In this guide, we’ll walk through best practices for mitigating some of the common security risks that can occur in:
- application code and third-party dependencies
- container images and workloads
- Kubernetes clusters and pods
- cloud infrastructure hosting your application
Along the way, we’ll show how Datadog’s Security Platform, in addition to its integrations with popular security services, gives you end-to-end visibility into your Kubernetes environment. The security platform is made up of:
- Datadog Cloud Security Management (CSM): visualizes the current and historic security posture of your cloud environment for real-time threat detection and continuous configuration audits
- Datadog Cloud SIEM: provides real-time analysis of operational and security logs for robust threat detection in dynamic, cloud-scale environments
- Datadog Application Security Monitoring (ASM): helps DevOps and security teams streamline application security to track suspicious requests, visualize the full scope of an attack, and surface vulnerabilities in code
With these offerings, you can easily identify risky service misconfigurations and mitigate legitimate threats at any level of your Kubernetes infrastructure.
Simple design flaws or implementation bugs in application code can be leveraged to compromise a cluster. According to the Open Web Application Security Project (OWASP), some of the most common code-level vulnerabilities include:
- insufficient logging practices
- outdated or vulnerable third-party dependencies
- substandard password protection and data transfer methods
- form fields that are not sanitized or validated
You can mitigate these types of risks with a few best practices for writing safer code, getting better visibility into code health, and detecting attacks that target your application and APIs. All of these steps are critical for building multi-layered security controls that protect Kubernetes infrastructure in both development and production environments.
Logging key application events is a good first step for proactively surfacing code-level gaps in security, but you may still introduce other weaknesses in your code’s design and structure—especially if your environment is rapidly evolving. Using code analysis tools to conduct regular audits of your application code can help you identify these types of security risks during development, so you can patch them before they are exploited by an attacker.
Analysis tools such as Semgrep and SonarQube flag problematic code and provide recommendations on how to resolve any issues. For example, form fields that do not sanitize or validate user-submitted data could potentially be vulnerable to SQL injection attacks. In these cases, an attacker submits database queries that can modify or delete data from your database as input (e.g.,
DROP TABLE Customers).
Datadog’s SonarQube integration gives you a high-level overview of your code’s quality and any flagged issues, which can help you verify that applications are using the recommended security protocols for protecting data, such as using HTTPS and TLS for data transmission and the appropriate algorithm for data encryption. These safety measures prevent attackers from accessing any sensitive information that passes through your application, such as credit card numbers and passwords.
Most applications leverage open source dependencies (e.g., libraries, packages, frameworks) that are managed by third parties, which means that you do not have as much control over their design or security. A vulnerability in a third-party library can easily jeopardize your application’s security, but remaining informed on the status of each dependency requires significant effort. For example, you may want to use a new version of a third-party library that includes a critical security patch, as well as a bug fix that may be incompatible with your application code and infrastructure resources. It’s critical to be aware of these kinds of caveats in order to ensure that you do not introduce breaking changes as you attempt to secure your application.
Scanning code dependencies regularly—and staying up to date on issues flagged by vulnerability databases—can help make you more aware of their state and better assess the risks of updating your code to a new or patched version of a dependency. Scanners like OWASP Dependency-Check provide more details about a compromised library, such as the affected versions, the vulnerability’s severity, and which versions you should upgrade to in order to fix the issue. They can also be integrated into your CI/CD pipelines, enabling you to identify the parts of your code that interact with compromised libraries before critical deployments. These measures allow you to make informed decisions about how to keep dependencies up to date while reducing the risk of introducing vulnerabilities or breaking changes.
When an attacker exploits code-level vulnerabilities—such as compromised versions of the log4j library—to target your application, Datadog ASM can automatically alert you to the threat. Datadog will generate a security signal that includes more information about affected code, the source IPs that triggered the vulnerability, and how to remediate.
Regular audits for application code and third-party dependencies is an important step for securing your application, but it may not be enough to protect against all attacks. Kubernetes environments are complex and highly dynamic, giving attackers more opportunities to hide their activity. For example, they may target individual containers or exploit smaller, more vulnerable application components that are easy to overlook. To mitigate this risk, it’s critical to have visibility into:
- file, process, and kernel activity on containers
- operations against application code and APIs
- accounts that interact with application services
Tracking changes to application files, directories, and running processes can give you a better understanding of the path of an attack and an attacker’s overall goals. For example, an attacker may use a database process to launch a shell via a SQL injection attack. This type of attack takes advantage of poorly sanitized application fields, giving someone an entry point to compromise a host or gain access to other critical application services.
Datadog CSM monitors file, process, and kernel activity on your containers in real time, with built-in detection rules to cover these types of commonly used techniques and tactics. You can also correlate this activity with signals generated by Datadog ASM, which automatically flags attacks that exploit application-level vulnerabilities.
For a better understanding of the source of these kinds of events, you can enable audit logging, which provides a wealth of information on Kubernetes activity. We’ll look at audit logging in more detail later.
In distributed environments, applications are broken down into smaller workloads, each of which runs on dedicated containers. Many teams leverage publicly available container images that already include the operating system and binaries needed for a particular workload, which can significantly reduce development time. But as containerized applications grow and leverage more resources, the chances of introducing new vulnerabilities to your workloads increases. In this section, we’ll explore some best practices you can follow to ensure their security.
The risks associated with pulling images from a public registry are similar to those involved in using third-party libraries in application code. You do not have full visibility into the structure of third-party container images, so you may inadvertently pull an image with outdated dependencies or malicious code.
To prevent these types of scenarios from occurring, you should validate that images are signed by authorized users and originate from a trusted source that actively maintains them, such as a known company or open source group. Pulling images from and monitoring your cloud provider’s registries like Amazon Elastic Container Registry (Amazon ECR) and Azure Container Registry can also significantly reduce risk.
Datadog provides detection rules that can help you monitor your container registries to ensure that you are pulling images that are safe to use in your applications. For example, Datadog can notify you when a container is pulled from a registry that is not secure or when a new image is uploaded to a private AWS ECR registry.
A new image in your ECR registry could indicate that an attacker is attempting to establish persistence by uploading a container with malicious code, so it’s important to be aware when a new image is unexpectedly added.
Privileged containers have direct access to host resources and other devices running on the host. An attacker that has access to one of these containers can therefore perform a variety of actions to modify host resources, such as updating the host’s /root/authorized_keys with their SSH public keys. Though there are some benefits to using privileged containers, such as leveraging them to run GPU-enabled workloads in Kubernetes clusters, it’s important to restrict their usage and always be aware of their status in your environment.
Since privileged containers have the same capabilities as the host, it can be more difficult to distinguish between malicious and routine activity. This problem becomes more prevalent as applications leverage thousands of containers to support workloads, rapidly spinning up new containers at regular intervals. Keeping track of the state of individual containers is often not feasible.
Datadog CSM offers detection rules that are automatically mapped to CIS benchmarks for Docker and Kubernetes, providing deep visibility into container- and cluster-level settings. For example, you can quickly single out privileged containers from the rest of your fleet, so you can determine if they are legitimate or not as soon as they spin up.
Datadog CSM can also help you monitor other potentially harmful configurations that an attacker can use in tandem with privileged containers, such as sensitive mount paths or privileged port mappings.
Container isolation creates boundaries between container workloads and hosts, ensuring that workloads—and attackers—have limited access to system resources. While limiting the use of privileged containers is one way to protect host resources, there are also several configuration options that improve container isolation:
- Container runtimes: use a runtime like CRI-O to leverage its built-in security features, such as the ability to enforce signed and encrypted images
- Resource limits: set container I/O, memory, and CPU limits to help prevent denial-of-service attacks
- Kernel capabilities: assign a reduced set of privileges (e.g., mount operations, filesystem access) to containers based on specific use cases to prevent access to critical resources
Collectively, these configuration options help you create multiple layers of security for your containers. Datadog CSM can detect suspicious activity on any container in your cluster, which can help you identify the ones that are not properly isolated from other workloads and hosts. For example, Datadog will flag any attempts to launch the kubectl utility directly in a container, indicating that an attacker may be attempting to find information that would grant the ability to execute a lateral movement (e.g., container to container, container to host).
Kubernetes manages and scales your application containers in clusters, which group workloads into one or more pods that share network and storage resources. Kubernetes also provides an API server that allows users and service accounts to make changes to pods, services, deployments, and more. Because Kubernetes is responsible for orchestrating your application, cluster resources should be configured appropriately to reduce the likelihood of an attack. There are some recommendations for securing Kubernetes clusters that can supplement your container-level configurations, which we will explore in this section.
Audit logging captures all events between the API server, application services, and users, giving you more details about the source of malicious activity within an environment. You can forward your audit logs to Datadog Cloud SIEM, which provides detection rules that help you automatically flag potential threats, such as:
Events like these could indicate that there are other security gaps in your clusters, such as a misconfigured API server or pods with escalated privileges.
If an attacker gains access to the Kubernetes API server, they can easily manipulate or destroy any part of your application. To address this risk, the API server provides several controls that you can configure to ensure that only authenticated users with the appropriate permissions can access the Kubernetes API—cloud platforms enforce authentication controls by default.
For example, you can use OAuth2 authentication services like OpenID Connect to first authenticate any user who attempts to access the API server, which helps limit access to just your organization. You can also leverage models such as role-based access control (RBAC) to authorize requests from specific authenticated users to the server. RBAC allows you to create roles that mirror your organization’s structure, so you can easily grant access to Kubernetes resources, including the API server, to only the users or groups who need it.
Limiting access to the Kubernetes API server also helps protect secrets stored there— such as API keys, user passwords, and certificates—across workloads, external services, and accounts. Secrets are stored unencrypted in the server’s underlying data store (i.e., etcd) by default, so anyone with access to etcd can view that data. Secrets can also be easily exposed to resources, such as via an environment variable for a pod. Anyone who manages that pod will also be able to see the exposed secret.
You can protect sensitive data by enabling encryption at rest for secrets. Kubernetes supports several different encryption providers but recommends using your cloud provider’s key management service (KMS) to maximize security. KMS providers store decryption keys remotely instead of in Kubernetes, so an attacker would need to gain access to both the Kubernetes API server and the KMS to decrypt secrets.
For better visibility into the state of a Kubernetes cluster, you can use Datadog CSM’s Kubernetes detection rules to quickly notify you of any configurations that make the cluster more vulnerable, such as not leveraging an encryption provider to encrypt secrets or RBAC to restrict traffic.
Pods share similar configurations and contexts as individual containers, such as network policies and resource limits, so you can leverage the same isolation rules to prevent attackers from creating or modifying pods or accessing other containers. Kubernetes provides out-of-the-box security policies via an admission controller to give you more control over pod configurations in a cluster—pods must be configured according to your policies in order to be deployed successfully. These policies offer various levels of protection based on Kubernetes recommendations, such as:
- restricting privileged pods and privilege escalation
- limiting pod capabilities (e.g., run mount operations, modify processes)
- restricting access to the host’s namespace, ports, and filesystems
Datadog CSM provides an extensive list of posture management detection rules for Kubernetes infrastructure to help you identify pods that are not configured according to these policies. Additionally, Datadog Cloud SIEM complements these configuration checks by automatically detecting suspicious activity across Kubernetes clusters that may be outside of normal operations, including activity that could result in a misconfigured pod. For example, Datadog will flag new pods that could be suspicious, such as those with privileged permissions or that have access to the host network. These scenarios could indicate that a cluster is not configured with a security policy or that a policy is not restrictive enough.
It’s important to note that the pod security admission controller is a beta feature as of v1.23, which may not be fully supported by some cloud providers. To mitigate this limitation, you can also leverage widely used open source tools like Open Policy Agent Gatekeeper to implement pod policies across your cloud environments. Datadog’s Gatekeeper integration enables you to monitor the status of your policies and ensure that they are configured appropriately.
The final layer of infrastructure is the cloud provider that hosts your application. Most providers offer managed services like Google Kubernetes Engine (GKE), Amazon Elastic Kubernetes Service (EKS), and Azure Kubernetes Service (AKS), to simplify the process for deploying and scaling your container environment, but they can be vulnerable to some of the same security risks as other parts of your infrastructure (e.g., misconfigurations, insufficient monitoring). The following best practices can give you more visibility into activity across your platform and ensure that any cloud resources supporting your Kubernetes infrastructure are configured appropriately.
As we discussed earlier, Kubernetes audit logs provide more details about cluster-level activity. For insights into events across a cloud provider, including logins, edits to a profile or resource, and the status of a resource, you can also collect platform-specific audit logs. Enabling and understanding how to interpret these logs can help you uncover application resources and cloud accounts that are not configured according to your security policies, which are the most common vulnerabilities in a cloud environment. Depending on your provider, you can enable AWS CloudTrail logs, Azure platform logs, or Google Cloud audit logs to capture activity.
Datadog Cloud SIEM leverages these logs to identify changes in cloud resources that warrant further investigation, such as an IAM policy that suddenly changes. You can also correlate these types of changes with Datadog CSM to help you determine if they are the result of a misconfigured account, such as an IAM user that has administrative access to your AWS environment, enabling them to change IAM policies.
Cloud-based Kubernetes applications require different users and services to have varying levels of access, which can introduce permission misconfigurations that can be exploited by attackers. For example, an attacker can take advantage of a misconfigured IAM permission in order to take over a GKE service account and make changes to an application cluster. Creating minimally privileged user and service accounts—and granting additional permissions only when necessary—can help protect Kubernetes resources from unauthorized access. You can check out GKE’s, EKS’s, and AKS’s documentation for best practices on implementing secure identity-based policies, which can complement your existing RBAC policies and container-level configurations.
Datadog CSM can help you monitor policies across cloud and multi-cloud environments, so you can ensure that all of your user and service accounts are configured appropriately. For example, Datadog will notify you when RBAC is not enabled on AKS instances.
Cloud platforms often provide a metadata API server to store metadata about environment resources, such as the name of virtual machine instances deployed on GKE. Metadata can include cloud credentials, identity tokens, and other sensitive information that running pods have full access to by default. Accessing a provider’s metadata API—such as Amazon EC2’s instance metadata service (IMDS)—is one way attackers explore Kubernetes infrastructure in order to find other resources that they can exploit. For example, an attacker can leverage a compromised pod in an EKS cluster to query the metadata service for an EC2 instance’s credentials. With this information, the attacker can access account-level details about the cluster (e.g., EC2 instance data, security groups, VPCs) and manipulate cluster resources.
Datadog CSM can notify you when a network utility like curl or wget accesses an EC2 IMDS via an interactive session. If Datadog detects this type of activity, it will generate a security signal that includes more details about that session, such as the executed commands and the affected host.
In production environments, using an interactive session to access an EC2 IMDS is not a common operation. It’s important to be aware of this activity so you can determine whether it came from a legitimate source or is an indicator of a larger threat to your resources. Network policies can help you mitigate this type of threat by restricting traffic from pods to your cloud provider’s metadata API.
Using our previous EKS example, you can leverage the Calico network policy engine to create the appropriate policies for your clusters. You can also use Datadog Network Monitoring to easily visualize traffic between Kubernetes clusters, the metadata API service, and other cluster resources and verify that your policies are working as expected. These measures help ensure that attackers can not retrieve credentials for other cloud resources should they gain access to a pod.
In this guide, we looked at some best practices for securing every level of your Kubernetes application—from application code to the cloud provider hosting your Kubernetes resources. We also explored how Datadog enables you to easily monitor your Kubernetes stack in its entirety and identify critical issues in real time. This multi-layered approach to security helps you remediate exploitable misconfigurations and detect legitimate threats and attacks as soon as they occur.