Best Practices for Monitoring AWS CloudTrail Logs | Datadog

Best practices for monitoring AWS CloudTrail logs

Author Justin Massey
Author Jonathan Epstein

Published: September 25, 2020

Engineering teams that build, scale, and manage cloud-based applications on AWS know that at some point in time, their applications and infrastructure will be under attack. But as applications expand and new features are added, securing the full scope of an AWS environment becomes an increasingly complex task.

To add visibility and auditability, AWS CloudTrail tracks the who, what, where, and when of activity that occurs in your AWS environment and records this activity in the form of audit logs. Accordingly, CloudTrail audit logs contain information that is key to monitoring the actions performed across your AWS accounts, identifying possible malicious behavior and surfacing parts of your infrastructure that might not be configured properly.

In this guide, we’ll look at:

But first, let’s take a quick look at how AWS CloudTrail organizes and monitors activity within your AWS accounts.

Blazing trails

As previously mentioned, AWS CloudTrail records each instance of activity (such as API requests and user logins) it detects in your environment as an event, which is a JSON object that specifies the activity’s particulars, including the time at which it occurred, who performed the activity, the resources that were affected by the activity, and more. You can view and filter all of your events in the Event History page in the AWS CloudTrail console, where they are available for up to 90 days after they occur.

The CloudTrail event history window displays all of the events that occurred within your account over the last 90 days.

Most AWS customers use a consolidated trail for all CloudTrail events. However, you can create an event stream that filters in or out events. For instance, in order to reduce your log load, you might want to create an event stream that solely consists of activity related to a certain AWS service or resource. To do this, you create a trail, or an event stream that sends events to a chosen AWS S3 bucket as log files. This way, your events are available according to the retention policy you specify, can be quickly filtered to find critical issues, and can be alerted on using Amazon CloudWatch or Amazon Simple Notification Service (SNS).

CloudTrail saves your audit logs in gzip archive form to the S3 bucket that you specify when creating the trail. The name of the file includes the trail creator’s account number, the Region in which the log was recorded, and the month, day, and year when the file was created. For more information on finding your CloudTrail log files, see the AWS documentation.

By default, trails are Region agnostic; that is, a trail will log relevant events across every Region. You can create single-Region trails to focus on a single Region’s activity, but we recommend creating an all-Region trail, as doing so will give you more visibility and automatically track data from new Regions as they come online.

You can also set up an organization trail to monitor all of the logs generated by the AWS accounts within an AWS Organization. AWS Organizations allows you to centrally manage the access permissions of users in all of the accounts in the organization, and can be set up at no additional cost. Organizations are recommended when your team needs to manage many different AWS accounts by governing your ever-changing environment and enforce configurations on your primary and member accounts.

Understanding AWS CloudTrail audit logs

AWS CloudTrail records three different types of events from most AWS services based on the actions users perform in the AWS Management Console, Command Line Interface (CLI), and SDKs/APIs, as well as automated actions performed by AWS. For a list of services that are not tracked by CloudTrail, see the AWS documentation. The three event types are:

  • Management events: entries for management and network (control plane) operations performed on the resources in your AWS account, such as security group configuration changes, IAM role permission adjustments, and AWS Virtual Private Cloud (VPC) network alterations.

  • Data events: entries for data request operations—such as Get, Delete, and Put API commands—performed on an AWS data plane resource.

  • Insight events: entries that reflect unusual API activity in your AWS account in comparison to your historical API usage, such as excessive API calls in a short frame of time.

As management and data events make up the vast majority of event logs in CloudTrail, we’ll look at them in more detail. For more information on using insight events to track and discover anomalies in your AWS data, see the AWS documentation.

Management events

Management events include all management operations performed on resources in your account, as well as most non-API actions. Non-API actions include logins (AwsConsoleSignIn) to the AWS console and automated service actions like cryptographic key rotations (AwsServiceEvent). AWS CloudTrail logs management events by default.

The sample management event below records a console login, indicated by the field eventType: AwsConsoleSignIn. It shows that someone with the userName Alice successfully signed in to the AWS console without multifactor authentication.

{
    "eventVersion": "1.05",
    "userIdentity": {
        "type": "IAMUser",
        "principalId": "AIDABBBBBBBBBBBBBBBBB",
        "arn": "arn:aws:iam::111111111111:user/alice",
        "accountId": "111111111111",
        "userName": "alice"
    },
    "eventTime": "2020-09-23T09:09:56Z",
    "eventSource": "signin.amazonaws.com",
    "eventName": "ConsoleLogin",
    "awsRegion": "us-east-1",
    "sourceIPAddress": "1.2.3.4",
    "userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36",
    "requestParameters": null,
    "responseElements": {
        "ConsoleLogin": "Success"
    },
    "additionalEventData": {
        "LoginTo": "https://console.aws.amazon.com/console/home",
        "MobileVersion": "No",
        "MFAUsed": "No"
    },
    "eventID": "6894a571-9f34-47b8-b75c-5f4ca34f281e",
    "eventType": "AwsConsoleSignIn",
    "recipientAccountId": "111111111111"
}

Data events

Data events provide details on the operations performed on or within a resource or service, such as AWS IAM roles, Amazon EC2 instances, Amazon S3 buckets, and AWS Lambda functions. Because they are often high-volume activities, data events are disabled by default when you create a trail; you must add the resources or resource types to a trail in order to track them in AWS CloudTrail.

The below example shows that user Alice successfully performed the PutObject Amazon S3 operation on a bucket called example-bucket to upload the file exampleFile.txt.

{
  "eventVersion": "1.07",
  "userIdentity": {
        "type": "IAMUser",
        "principalId": "AIDAAAAAAAAAAAAAAAAAA",
        "arn": "arn:aws:iam::111111111111:user/Alice",
        "accountId": "111111111111",
        "accessKeyId": "AKIAAAAAAAAAAAAAAAAAA",
        "userName": "Alice"
    },
  "eventTime": "2020-09-22T20:15:25Z",
  "eventSource": "s3.amazonaws.com",
  "eventName": "PutObject",
  "awsRegion": "us-east-1",
  "sourceIPAddress": "1.2.3.4",
  "userAgent": "[Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36]",
  "requestParameters": {
    "X-Amz-Date": "20200922T201524Z",
    "bucketName": "example-bucket",
    "X-Amz-Algorithm": "AWS4-HMAC-SHA256",
    "x-amz-acl": "private",
    "X-Amz-SignedHeaders": "content-md5;content-type;host;x-amz-acl;x-amz-storage-class",
    "Host": "example-bucket.s3.us-east-1.amazonaws.com",
    "X-Amz-Expires": "300",
    "key": "exampleFile.txt",
    "x-amz-storage-class": "STANDARD"
  },
  "responseElements": null,
  "additionalEventData": {
    "SignatureVersion": "SigV4",
    "CipherSuite": "ECDHE-RSA-AES128-GCM-SHA256",
    "bytesTransferredIn": 12,
    "AuthenticationMethod": "QueryString",
    "x-amz-id-2": "d2UncmUgaGlyaW5nIDopIGh0dHBzOi8vd3d3LmRhdGFkb2docS5jb20vY2FyZWVycy8K",
    "bytesTransferredOut": 0
  },
  "requestID": "EEEEEEEEEEEEEEEE",
  "eventID": "f378e059-d87f-44b7-aee2-7ebfa1beff93",
  "readOnly": false,
  "resources": [
    {
      "type": "AWS::S3::Object",
      "ARN": "arn:aws:s3:::example-bucket/exampleFile.txt"
    },
    {
      "accountId": "111111111111",
      "type": "AWS::S3::Bucket",
      "ARN": "arn:aws:s3:::example-bucket"
    }
  ],
  "eventType": "AwsApiCall",
  "managementEvent": false,
  "recipientAccountId": "111111111111",
  "eventCategory": "Data"
}

Interpreting your CloudTrail logs

AWS CloudTrail logs contain invaluable information that lets you monitor activity across your AWS environment, so it’s important to understand how to interpret them in order to conduct investigations. In this section, we’ll do a deep-dive into a sample management event in a CloudTrail log file to illustrate which fields you should focus on.

CloudTrail log files are written in JSON format, with each event presented as a single JSON object. Entries of all event types include some of the same important fields, such as the access key ID of the AWS identity that performed the action (userIdentity fields) and the details of the action performed (eventName and requestParameters). Management and data event entries also provide responseElements fields that help you determine whether the action was successfully performed.

In the snippet below, we can see that a user named Alice (userName) made a call to create a new user (eventName) named Bob (requestParameters).

{
    "eventVersion": "1.05",
    "userIdentity": {
        "type": "IAMUser",
        "principalId": "AIDAAAAAAAAAAAAAAAAAA",
        "arn": "arn:aws:iam::111111111111:user/Alice",
        "accountId": "111111111111",
        "accessKeyId": "AKIAAAAAAAAAAAAAAAAAA",
        "userName": "Alice"
    },
    "eventTime": "2020-09-21T10:31:20Z",
    "eventSource": "iam.amazonaws.com",
    "eventName": "CreateUser",
    "awsRegion": "us-east-1",
    "sourceIPAddress": "1.2.3.4",
    "userAgent": "console.amazonaws.com",
    "requestParameters": {
        "userName": "bob",
        "tags": []
    },
    "responseElements": {
        "user": {
            "path": "/",
            "userName": "bob",
            "userId": "AIDABBBBBBBBBBBBBBBBB ",
            "arn": "arn:aws:iam::111111111111:user/bob",
            "createDate": "Sep 21, 2020 10:31:20 AM"
        }
    },
    "requestID": "604e7549-4ea4-4185-83b0-acff4e462d27",
    "eventID": "600e50af-0a2c-4352-95a8-7b813c744072",
    "eventType": "AwsApiCall",
    "recipientAccountId": "111111111111"
}

Because the entry returns identification details for the newly created user (responseElements), we know that the command was successfully performed. Otherwise, the JSON response would have included an errorCode and errorMessage element, as seen in the AWS documentation.

Before we look at the most important CloudTrail logs to monitor, it’s essential to understand the different user identity types defined by CloudTrail, and how CloudTrail identifies the user who performed an action.

CloudTrail identity types

Every CloudTrail event log contains a userIdentity element that describes the user or service that performed the action. Within this element, the type field describes which sort of user or service made the request and which level of credentials that user or service employed to make the request. CloudTrail userIdentity types include:

  • Root: The request was made with your primary AWS account credentials. If you set up an alias for your AWS account, that alias will appear here instead.

  • IAMUser: The request was made with the credentials of an IAM user.

  • FederatedUser: The request was made by a user with temporary security credentials provided through a federation token.

  • AWSAccount: The request was made by a third-party AWS account.

  • AWSService: The request was made by an AWS service account. Many AWS services use service accounts to perform automated actions on your behalf.

  • AssumedRole: The request was made with temporary credentials obtained by using the AWS Security Token Service (STS) AssumeRole operation.

While most of these identity types are fairly straightforward, AssumedRoles obfuscate the name of the user who performed the action. In the following section, we will look at how AssumeRole calls work in practice, how to determine the user behind an AssumedRole identity, and how a clever adversary might use an AssumedRole session to hide their true identity.

Interpreting the initial identity of an ‘AssumedRole’ CloudTrail log

A common practice for a multi-account setup in AWS is to manage all users in a single AWS account, which we’ll call account A. In turn, a security best practice is to make sure that IAM users do not have any IAM policies directly associated with them, and to instead give them temporary credentials to perform actions. We can do this by creating a separate account (e.g., account B) that contains IAM roles, each of which has a set of allowed actions that are defined in an IAM policy. We can then allow users in account A to assume those roles when they need to perform an action.

Let’s say a user in account A wants to list all of the AWS Regions enabled in account B. First, the user would AssumeRole into a role in account B that has DescribeRegions permissions, obtain the temporary credentials returned by the AssumeRole command, and then use them to perform the command. The CloudTrail log in which a user (userName: Alice) from account A (accountId: 222222222222) assumes a role in account B (accountId: 11111111111) would look like this:

{
    "eventVersion": "1.05",
    "userIdentity": {
        "type": "IAMUser",
        "principalId": "AIDAAAAAAAAAAAAAAAAAA",
        "arn": "arn:aws:iam::222222222222:user/Alice",
        "accountId": "222222222222",
        "accessKeyId": "AKIAAAAAAAAAAAAAAAAAA",
        "userName": "Alice"
    },
    "eventTime": "2020-09-22T16:23:50Z",
    "eventSource": "sts.amazonaws.com",
    "eventName": "AssumeRole",
    "awsRegion": "us-east-1",
    "sourceIPAddress": "1.2.3.4",
    "userAgent": "aws-sdk-go/1.16.8 (go1.12.7; linux; amd64)",
    "requestParameters": {
        "roleArn": "arn:aws:iam::111111111111:role/ExampleRole",
        "roleSessionName": "ExampleRoleSession",
        "externalId": "ffffffffffffffffffffffffffffffff",
        "durationSeconds": 3600
    },
    "responseElements": {
        "credentials": {
            "accessKeyId": "ASIADDDDDDDDDDDDDDDD",
            "expiration": "Sep 22, 2020 5:23:50 PM",
            "sessionToken": "d2UncmUgaGlyaW5nIDopIGh0dHBzOi8vd3d3LmRhdGFkb2docS5jb20vY2FyZWVycy8K"
        },
        "assumedRoleUser": {
            "assumedRoleId": "AROAEEEEEEEEEEEEEEEEE:ExampleRoleSession",
            "arn": "arn:aws:sts::111111111111:assumed-role/ExampleRole/ExampleRoleSession"
        }
    },
    "requestID": "4da64d92-6130-4355-86f2-1609a6eb53e1",
    "eventID": "ffef7974-b1a0-4e88-b27f-0b143965f30c",
    "resources": [
        {
            "accountId": "111111111111",
            "type": "AWS::IAM::Role",
            "ARN": "arn:aws:iam::111111111111:role/ExampleRole"
        }
    ],
    "eventType": "AwsApiCall",
    "recipientAccountId": "111111111111",
    "sharedEventID": "4f61c867-6a49-4c41-a267-388c38e99866"
}

The AssumeRole command returns an AccessKeyId (ASIADDDDDDDDDDDDDDDD) that user Alice can then use to perform the role’s delegated actions. In the following event log, we can see that an AssumedRole user uses the access key ASIADDDDDDDDDDDDDDDD to perform the DescribeRegions operation; we can thus infer that user Alice used the access key.

{
    "eventVersion": "1.05",
    "userIdentity": {
        "type": "AssumedRole",
        "principalId": "AROAEEEEEEEEEEEEEEEEE:ExampleRoleSession",
        "arn": "arn:aws:sts::111111111111:assumed-role/ExampleRole/ExampleRoleSession",
        "accountId": "111111111111",
        "accessKeyId": "ASIADDDDDDDDDDDDDDDD",
        "sessionContext": {
            "sessionIssuer": {
                "type": "Role",
                "principalId": "AROAEEEEEEEEEEEEEEEEE",
                "arn": "arn:aws:iam::111111111111:role/ExampleRole",
                "accountId": "111111111111",
                "userName": "ExampleRole"
            },
            "webIdFederationData": {},
            "attributes": {
                "mfaAuthenticated": "false",
                "creationDate": "2020-09-22T15:58:31Z"
            }
        }
    },
    "eventTime": "2020-09-22T16:26:02Z",
    "eventSource": "ec2.amazonaws.com",
    "eventName": "DescribeRegions",
    "awsRegion": "us-east-1",
    "sourceIPAddress": "1.2.3.4",
    "userAgent": "aws-sdk-go/1.16.8 (go1.12.7; linux; amd64)",
    "requestParameters": {
        "regionSet": {}
    },
    "responseElements": null,
    "requestID": "0a857cb2-90c4-4f09-9624-1149fb27f8a1",
    "eventID": "26fe99a5-8ed5-4923-9cf7-b6cdf96fa5f3",
    "eventType": "AwsApiCall",
    "recipientAccountId": "111111111111"
}

Controlling AssumedRole session names

A good way to control assumed roles and more easily track users who perform actions using assumed roles is to stipulate the user’s session name. To do this, you specify the permissible session names in the trust policy of the role that will be assumed. For example, the following trust policy specifies that, in order to assume a role, the user must name their session after their own username. Otherwise, the AssumeRole command will fail.

{
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "AWS": "arn:aws:iam::<AccountNumber>:root"
          },
          "Action": "sts:AssumeRole",
          "Condition": {
            "StringLike": {
              "sts:RoleSessionName": "${aws:username}"
            }
          }
        }
      ]
    }

With this configuration, you can easily track and filter the actions performed in each assumed role session—or catch anyone who fails to provide a valid session name. For more examples of controlling session names, see the related AWS blog post.

Key CloudTrail audit logs to monitor

IAM policies in AWS are complex; they can potentially provide users with permissions to access all resources within an AWS account. This means that there is ample opportunity for security misconfigurations that will inadvertently allow someone to manipulate your environment and give themselves further access to your assets. By monitoring your audit logs, you can get a fuller picture of user activity and the ways in which users interact with your resources—including whether or not they’re authorized to perform those interactions in the first place.

Attackers often look for overly permissive IAM permissions or misconfigurations across a wide variety of AWS resources, including:

  • IAM users/roles
  • EC2 instances
  • S3 buckets

For example, an S3 bucket might have a policy attached that provides Read access to all authenticated users, instead of just the authenticated users within your account. If an attacker were to discover this vulnerability, they could read all of the data saved within that bucket, potentially exposing customer or business information.

Your CloudTrail logs contain a reliable record of user activity and, once you know which logs to focus on, can provide you with all of the information you need to monitor your environment. The following resource-based types of logs are particularly important, as they are where the majority of threats will originate:

We’ll take a look at some sample logs generated from these resource-based operations below. When reading event logs, you should always pay attention to the JSON attributes that can help you spot a possible attack or misconfiguration. These include the response of a call (i.e., the responseElements), the API call that was made (i.e., the eventName), and any identifying information, such as the user or role that called the command (i.e., various fields under userIdentity).

User accounts

One of the most common ways for an attacker to infiltrate your environment is by using an exposed AWS Secret Access Key and enumerating the key’s permissions. If the exposed key has extensive management permissions, the attacker can proceed to give themselves further permissions while disabling your security infrastructure. Monitoring your CloudTrail logs for the following activity can help alert you to attackers as they inspect their permissions and attempt to maintain persistence in your environment:

Unauthorized user activity logs contain the following error message in the responseElements:

{
   [...],
    "errorCode": "Client.UnauthorizedOperation",
    "errorMessage": "You are not authorized to perform this operation.",
    "requestParameters": {
        "regionSet": {}
    },
    "responseElements": null,
    "requestID": "0a857cb2-90c4-4f09-9624-1149fb27f8a1",
    "eventID": "26fe99a5-8ed5-4923-9cf7-b6cdf96fa5f3",
    "eventType": "AwsApiCall",
    "recipientAccountId": "111111111111"
}

A single unauthorized activity log is not necessarily indicative of a threat. For instance, the unauthorized action may have occurred due to a user not having the permissions needed to view certain AWS console resources. Or it could be the result of a service attempting to call on a resource for which it does not have access.

However, if this is the first time an IAM user is receiving an authorization error, it might be worth investigating what caused the error. It could be the result of an attacker attempting to use the account or service to gain further access to your resources. For instance, they might attempt to create a new user or role as a backdoor into your environment, or expand the IAM policy already associated with a user or role they’ve gained access to.

In order to go undetected when performing unauthorized or malicious actions, an attacker might attempt to disable the Amazon GuardDuty threat detectors running in your AWS account. It’s always worth investigating any instances of GuardDuty detector deletion.

Buckets

Attackers often target S3 buckets when attempting to breach your environment. As with user accounts, an attacker might gain access to a bucket’s contents due to a security misconfiguration or human error. By monitoring your CloudTrail logs, you can spot the following bucket enumeration and modification attack techniques.

If an attacker gets access to an EC2 instance, the first thing they might do is enumerate all of the S3 buckets that they have access to from the relevant instance profile, or attempt to change a bucket’s access policy altogether. As most automated resources already have direct access to all of the buckets they need, a ListBuckets or PutBucketPolicy call is usually worthy of investigation.

Similarly, an attempt to remove a public access block attached to an S3 bucket, is an event that should be investigated. This could be a legitimate user trying to accomplish a task by removing a security control as a debugging mechanism. Alternatively, it could be an attacker attempting to open the bucket to the public internet. We recommend investigating DeleteAccountPublicAccessBlock event logs as soon as possible.

Networking components

Attackers may also attempt to access your environment through a misconfigured network resource, such as a VPC, a route table, a network gateway, a network access control list, or a security group. CloudTrail logs can help you spot the following types of possible network attacks and take the proper steps to resolve the breach.

To check the posture of your networking resources and make sure that they are securely configured, we recommend using Datadog’s Compliance Monitoring tool, which scans your AWS environment for misconfigurations in real-time.

Collect and analyze CloudTrail logs with Datadog

The benefits of using Datadog as your log monitoring platform for your AWS infrastructure include:

Once you’ve set up the AWS integration for your services and have CloudTrail logs streaming into Datadog, you can build custom dashboards to get a high-level perspective on the health and security of your AWS environment. And using Datadog’s built-in Threat Detection Rules, you can detect critical security and operational issues—including the ones we discussed above—as they occur.

Export your CloudTrail logs to Datadog

Exporting CloudTrail logs from AWS to Datadog enables you to analyze and more deeply contextualize the events recorded with other observability data from your environment. A simple way of doing this is by using Amazon Kinesis Data Firehose, a fully managed AWS service that automates the delivery of your real-time, distributed streaming data to external data storage and analysis repositories.

Using Kinesis Data Firehose for AWS data delivery comes with a number of advantages, including near-real-time uploading, serverless data transformation options, and integrations with the full suite of AWS services. For instructions on setting up Kinesis Data Firehose for use with Datadog, see our blog post.

Explore CloudTrail Logs in Datadog

Once your audit logs are streaming into Datadog’s Log Explorer, you can easily filter and search them to find the most important logs for your particular use case. For instance, referring back to the key AWS audit logs to monitor, you might want to look for events in which a user attempted to create or change the permissions of a security group. To do so, you would filter your logs to look for CreateSecurityGroup, AuthorizeSecurityGroupIngress, or AuthorizeSecurityGroupEgress events.

You can use the Datadog Log Explorer to manage, sort, and gain insights from your CloudTrail logs.

Beyond filtering your audit logs to find potential problems, you can use them to build high-level Datadog dashboards with custom data visualizations. This way, you can get a quick, top-down perspective of your incoming logs without endlessly filtering through them.

Build dashboards from your CloudTrail logs to provide a top-down perspective with which to assess the performance and health of your AWS environment.

Detect security threats in real-time

To help you catch security threats as they occur, Datadog Security Monitoring lets you apply strict Detection Rules to your entire event stream as it is ingested. Detection Rules are available out of the box and match the attack techniques enumerated by the MITRE ATT&CK® framework, which covers many of the critical event types we’ve already discussed. Additionally, if you’d like to evaluate events based on your environment’s specific needs, you can create your own rules.

When an incoming event matches one of your Detection Rules, Datadog creates a Security Signal that can be inspected in the Security Signals explorer. Security Signals provide context around each trigger event, such as the username and IP address that initiated the offending action, the timeline of the event itself, and standardized guidelines for responding to the threat.

Start monitoring your AWS CloudTrail audit logs

In this post, we reviewed how to interpret AWS CloudTrail audit logs: we looked at how each event type works, outlined best practices for following users and roles across multiple logs, and highlighted the most important audit logs to investigate. We also walked through how to import your CloudTrail logs into Datadog using Amazon Kinesis Data Firehose, as well some of the best ways to use Datadog to triage your logs and catch security issues as they occur. For more information on monitoring your AWS audit logs and securing your applications with Datadog, check out our documentation. And if you’re not already using Datadog, get started now with a .