Best Practices for Identity and Access Management in Cloud-Native Infrastructure | Datadog

Best practices for identity and access management in cloud-native infrastructure

Author Mallory Mooney

Published: April 13, 2023

Editor’s note: This is the final part of a five-part cloud security series that covers protecting an organization’s network perimeter, endpoints, application code, sensitive data, and service and user accounts from threats.

So far in this series, we’ve looked at the importance of securing an organization’s network, its application components, the endpoints that support those components, and its application data. In this post, we’ll look at the following best practices for securing access to and from cloud environments:

But first, we’ll briefly look at how access decisions are made in the cloud.

Primer on identity and access in the cloud

The shift from on-premise to cloud infrastructure introduces new challenges for managing digital identities and their ability to access resources. Today, organizations need to not only keep track of all the internal and external services, users, and other sources interacting with their environment but also manage their varying levels of access. That’s why processes like identity and access management (IAM) have become an integral part of cloud security. IAM oversees an environment’s digital identities, such as users and services, and their level of access to cloud resources.

To accomplish this, IAM frameworks leverage the Authentication, Authorization, and Accounting (AAA) model, which provides a baseline for creating efficient access control. The following diagram illustrates how the IAM framework and the AAA model’s three components—authentication, authorization, and accounting—work together within a cloud environment:

Diagram illustrating IAM's framework

The authentication step verifies that a user’s or service’s digital identity—typically in the form of a dedicated cloud-based account or workload—matches an environment’s record. The authorization step determines what the confirmed identity has access to within an environment. In cloud environments, this step often involves using mechanisms such as role-based access control (RBAC) to assign permissions to users and services. IAM roles, for example, include a predefined set of permissions that can be assigned to any user or service account based on its function and goals. Finally, the accounting step monitors environment activity, logging information about identity sessions and the resources they accessed. This process not only enables organizations to efficiently limit an identity’s permissions but also creates an audit trail for security-related evaluations.

Identity and access management is deeply integrated into every part of an organization’s environment, so it’s important to know how to best approach IAM workflows. Next, we’ll look at best practices that organizations can implement at each layer of the AAA model to help strengthen their IAM systems and thereby each layer of their infrastructure.

Treat identities as a new kind of boundary

As described in Part 1 of this series, the boundaries between an organization’s managed network and any external networks, such as the public internet, are more complex in the cloud than they are in on-premise environments. Cloud infrastructure is constantly evolving to support modern applications, which creates an almost unlimited number of entry points that organizations need to account for as a result. On top of that, control plane APIs—responsible for managing cloud infrastructure—are readily available to any user with the right set of credentials. These scenarios result in boundaries that are more fluid than the borders of a traditional, on-premise network.

Inventorying key points of entry, such as public-facing web servers, can help establish a secure boundary around a cloud environment, but it is often not enough to fully protect resources. Today, organizations are filling in the gaps by treating identities as a new kind of boundary. This means that they are shifting their focus to include monitoring who or what is accessing an environment, instead of just where that traffic comes from.

Regularly auditing an organization’s identities ensures that they can visualize the boundaries of their environment with improved accuracy. For example, orphaned user accounts are a common type of identity that leave an environment vulnerable to an attack. They typically include identities that were set up for third-party contractors temporarily or employees that have since left the organization. Identities for applications, services, and resources that are configured to run in the background indefinitely or leverage static, long-lived credentials are also susceptible to an attack. Due to their role, these identities are less likely to be monitored. On top of that, they may not always follow an organization’s current security protocols, making them easy targets for threat actors to misuse.

Once organizations have a better understanding of the identities that are interacting with their environments, they can focus on how to protect them, which we’ll look at next.

Use complex passwords and multi-factor authentication for user accounts

Treating identities as a boundary means that an organization’s authentication controls, the first part of the AAA model, need to be a priority. As previously mentioned, the authentication process confirms that an identity requesting access to resources matches an organization’s record of that identity. Weak authentication controls leave identities and their associated accounts vulnerable to threats like account takeover.

For user accounts specifically, common examples of inefficient authentication mechanisms include weak account passwords and the absence of multi-factor authentication (MFA). Multiple characteristics can make account passwords less secure; some examples include using a single word, common combinations like “password123”, or personal information like a name or street address. Threat actors can take advantage of these weaknesses via dictionary-based, phishing, or brute force attacks or by using publicly available, shared, or breached information to log in to an account.

Enforcing strong passwords and MFA on user accounts can protect them from account takeovers and other threats. Strong passwords typically include characteristics like complex passphrases or strings with ten random characters at minimum. Organizations can take these simple measures a step further by encouraging the use of hardware-based MFA tools like Yubikeys. This ensures that accounts are less susceptible to phishing attacks, which can bypass more commonly used MFA methods like sending one-time SMS codes to an individual’s personal device. They can also include checks like location-based authentication to confirm an identity based on its active geographic location.

These measures help strengthen user accounts, but additional steps should be taken to ensure that all of an organization’s managed identities are as secure as possible. This is especially true for identities associated with non-human accounts—often referred to as service accounts—which we’ll focus on next.

Limit the use of static, long-lived credentials for service accounts

In cloud environments, identities use digital authentication credentials—often referred to as secrets—to securely access various parts of a system and data. With the rate at which cloud environments scale, it’s not uncommon for organizations to rely on user-managed, hardcoded, or shared secrets for their services in order to limit interruptions in workflows. In these cases, secrets rarely change and are often forgotten about as an environment continues to grow.

However, these static, long-lived credentials can significantly increase an organization’s attack surface. If a threat actor finds an exposed secret, such as one that is accidentally stored in a public repository, they can gain access to other parts of a system. In fact, long-lived credentials are a popular entryway among threat actors to gain initial access to an environment.

Limiting the use of static or shared credentials for identities, especially those associated with services, accomplishes two primary goals. First, it reduces the time of exposure in the case of a data breach. An exposed credential that is set to expire reduces the likelihood that a threat actor can take advantage of it to access an environment. Second, unique credentials restrict what a threat actor can do within a system if they do gain access.

Organizations can replace static credentials with those generated by their cloud provider’s identity management services, such as GKE Workload Identity or AWS IAM roles for EC2 instances. For example, Workload Identity is recommended for granting GKE-based service accounts access to Google Cloud services. This tool enables organizations to map their Kubernetes service accounts to Google Cloud service accounts. Associated Kubernetes workloads will automatically have the appropriate access based on the assigned Google Cloud account as a result. This approach encourages organizations to leverage the provider’s built-in key management workflows. Workload Identity can also integrate with other cloud-based secret managers, which can help organizations further reduce the number of static credentials in their environment.

Developing controls that strengthen identities is a key part of identity management. The next step in the process involves organizing them into logical groups, which serves as a bridge between managing them and their level of access to an environment’s resources.

Organize identities into logical groups

Cloud environments manage thousands of identities at any given time. This kind of volume can create blindspots in how and when critical resources are accessed. To mitigate this risk, organizations can sort identities into logical groups based on their role or function. Groups comprise a collection of users and enable organizations to globally assign permissions.

The way in which identities are grouped depends on the organization’s structure. The following diagram illustrates how an organization can group users and services based on their role:

Diagram illustrating an organization's hierarchy

In this example, users are grouped by their function (such as admin, engineering, and security) and team. This structure allows organizations to determine how to best assign permissions to each group based on which resources they need access to. Team C from Customer Support, for example, may not need the same set of permissions or access to the same group of resources as Team A from Engineers.

Organizing identities according to this type of hierarchy is helpful for managing permissions at a high level. If an employee moves to a different team, their identity can be assigned to the appropriate group, which automatically grants them the right level of access. Grouping identities like this also provides more context around who is accessing a resource, which is critical for monitoring audit and authentication logs.

Creating logical groups enables organizations to efficiently manage identities. It also sets the foundation for how they manage various levels of access, which we’ll look at next.

Assign permissions based on zero-trust and least-privilege principles

Once an identity authenticates with an environment—the second step of the AAA model—it can access resources based on its level of permissions. Without an effective authorization system, organizations risk creating overly permissive policies for the users and services connecting to their environments. If a threat actor takes over an account with elevated privileges, for example, they can use it to access critical resources and their data.

To mitigate these types of scenarios, organizations should assign permissions as needed, based on the principle of least privilege and zero-trust mechanisms. These controls enable teams to systematically deploy the right permissions at every level of their cloud infrastructure, from their network infrastructure and endpoints to the data they generate.

Organizations can implement zero-trust and least privilege controls for IAM by first considering the following questions when granting access to a particular resource:

  • Which identities should be given access to it?
  • How should the approved identity access the resource and its data?
  • When should approved identities be allowed to access it?
  • Why do these identities need access?
  • What data should the identities be allowed to access?
  • Where should approved identities be allowed to access it from?

To illustrate further, organizations can use authorization strategies like role-based access control (RBAC) to define who can access a resource and its data and why they need to. They can also require authenticated users to first connect to an internal VPN in order to use those resources—the “how” behind granting access and a critical part of segmenting networks.

RBAC in particular is a popular method for implementing zero-trust and least-privilege controls in cloud environments. It enables organizations to further group their identities by specific tasks. Using our previous hierarchy example, engineering teams A and B both need access to the DB1 resource, but only team B needs to make updates to it. In this case, assigning Team A a role with read permissions and team B one that includes write or edit permissions ensures that both teams have the appropriate level of access they need to perform their jobs.

As with the large volume of identities that an organization manages for their cloud environments, the sheer amount of evolving permissions and permission sets can also be difficult to track. It’s important to periodically audit all of these components to ensure that an identity’s level of access is not overly permissive. This leads to the final step of the AAA model: accounting. Next, we’ll talk about how organizations can use their logs to monitor IAM activity.

Monitor IAM activity using logs

The accounting step is a crucial part of an organization’s identity and access management system. It brings together information from authentication and authorization controls to provide complete context for who is accessing a resource, including when, how, and why they accessed it. Cloud environments provide a wealth of telemetry data for reviewing user activity, and their logs offer the best insight for monitoring activity from a specific user.

Logs capture key information about user activity, as seen in the following AWS CloudTrail log:

{
  "userIdentity": {
    "type": "IAMUser",
    "principalId": "ASDFGHJKLZX6C2EXAMPLE",
    "arn":"arn:aws:iam::123456789012:user/MaddieShepherd",
    "accountId": "123456789012",
    "userName": "MaddieShepherd"
  },
  "eventTime": "2023-01-01T15:35:22Z",
  "eventSource": "signin.amazonaws.com",
  "eventName": "ConsoleLogin",
  "awsRegion": "us-east-1",
  "sourceIPAddress": "192.0.1.100",
  "userAgent": "aws-sdk-go/1.21.7 (go1.12.6; darwin; amd64) APN/1.0 HashiCorp/1.0 Terraform/0.12.5",
  "errorMessage": "Failed authentication",
  "requestParameters": null,
  "responseElements": {
    "ConsoleLogin": "Failure"
  },
  "additionalEventData": {
    "MobileVersion": "No",
    "LoginTo": "https://console.aws.amazon.com/",
    "MFAUsed": "No"
  },
  "eventID": "11ea990b-4678-4bcd-8fbe-62509088b7cf"
}

This log captures details about a user (MaddieShepherd) who failed to log in to the AWS Console. Maintaining visibility into this kind of activity is necessary for surfacing potential threats. For instance, a stream of logs showing multiple failed attempts from the same user followed by a successful attempt could indicate that a threat actor took over the account via methods like credential stuffing.

Using a centralized logging tool can help organizations collect logs from all their cloud environments. But in order to efficiently identify security threats, organizations can supplement their log monitoring with Cloud SIEM platforms. These tools automatically flag suspicious activity from the millions of logs that a cloud environment may generate within a short period of time. For example, Datadog’s Cloud SIEM Investigator helps organizations easily visualize the complete path of an identity from the moment it authenticates with the environment.

Review the path of an attack with Cloud SIEM Investigator

As seen in the preceding example, the IAM user stephen.fisher successfully retrieved the policy for the customer-data bucket. An increase in these types of calls from the same user can indicate that a threat actor took over the account and is attempting to discover what level of access they have.

Tools like centralized logging and Cloud SIEM enable organizations to build identity-centric monitoring workflows. This strategy allows organizations to secure the boundaries of their cloud environments and ensure that their IAM controls are working properly.

Develop a comprehensive IAM strategy

In this post, we discussed the importance of identity and access management and the role it plays in an organization’s overall security strategy. Using the AAA model as the foundation for an organization’s IAM systems ensures that identities are verified and configured with the appropriate permissions. Monitoring IAM activity gives organizations visibility into events from authorized users, as well as any events that could be potentially malicious.

To learn more about Datadog’s logging and security offerings, check out our documentation. If you don’t already have a Datadog account, you can sign up for a .