Monitoring AWS Lambda with Datadog

serverless monitoring infrastructure monitoring aws aws lambda processes

14 min read

Feb 5, 2020

Mallory Mooney

In Part 2 of this series, we looked at how Amazon's built-in monitoring services can help you get insights into all of your AWS Lambda functions. In this post, we'll show you how to use Datadog to monitor all of the metrics emitted by Lambda, as well as function logs and performance data, to get a complete picture of your serverless applications.

View all of your Lambda metrics in Datadog's out-of-the-box integration dashboard — Visualize your AWS Lambda metrics with Datadog's out-of-the-box integration dashboard.

In this post, we will:

Enable Datadog's AWS and Lambda integrations
Collect enhanced metrics and more with Datadog's Lambda Library
Monitor Lambda traces and logs
Detect trends in Lambda performance and create alerts

Enable Datadog's AWS integration

Datadog integrates with AWS Lambda and other services such as Amazon API Gateway, S3, and DynamoDB. If you're already using Datadog's AWS integration and your Datadog role has read-only access to Lambda, make sure that "Lambda" is checked in your AWS integration tile and skip to the next section.

Configure AWS Lambda metric collection

To get started, configure IAM role delegation and an IAM policy that grants your Datadog role read-only access to AWS Lambda and any other services you wish to monitor. You can find an example policy in our documentation.

If you use other AWS integrations with Lambda, such as AWS Step Functions or Amazon EFS for Lambda, there are a few permissions that you will need to include in your Datadog IAM policy:

states:ListStateMachine: List active Step Functions
states:DescribeStateMachine: Get Step Functions metadata and tags
elasticfilesystem:DescribeAccessPoints: List active EFS resources connected to Lambda functions

Then navigate to the AWS integration tile in your Datadog account. Add your AWS account information, along with the name of the IAM role you configured. Make sure that you select "Lambda" (along with the names of any other services you want to start monitoring).

Enable Datadog's Lambda integration in the AWS integration tile.

Visualize your AWS Lambda metrics

Datadog will automatically start collecting the key Lambda metrics discussed in Part 1, such as invocations, duration, and errors, and generate real-time enhanced metrics for your Lambda functions. You can easily visualize all of this data with Datadog's out-of-the-box integration and enhanced metrics dashboards, giving you deep visibility into the performance of your Lambda functions.

View all of your Lambda enhanced metrics in Datadog's out-of-the-box integration dashboard

You can also customize your dashboards to include function logs and trace data, as well as metrics from all of your services, not just Lambda. Check out our documentation for more information about creating custom dashboards for your services.

Get more insight with Datadog's Lambda Library

Though Datadog's AWS Lambda integration automatically collects standard metrics (e.g., duration, invocations, concurrent executions), you can also set up Datadog's Lambda Library to get deeper insights from your code. In this section, we'll show you how the Lambda Library can help you collect custom business metrics, distributed traces, and enhanced metrics from your functions. Datadog's Lambda Library runs as a part of each function's runtime, and works with the Datadog Lambda extension to generate high-granularity enhanced metrics and automatically surface actionable insights into your functions. Data collected with the Lambda Library complements the metrics, logs, and other traces that you are already collecting from services outside of Lambda.

Set up the Lambda Library

The Lambda extension currently supports Node.js, Python, Java, Go, and .NET (beta) runtimes. To use the extension, you will first need to install the Lambda Library by instrumenting your application with the appropriate Lambda runtime. You can then install the Lambda extension as a Lambda Layer by adding the following Amazon Resource Name (ARN) to your function.

1
arn:aws:lambda:<AWS_REGION>:464622532012:layer:Datadog-Extension:<EXTENSION_VERSION>

Replace AWS_REGION and EXTENSION_VERSION with the appropriate values for your application. You will need to use at least version 7 for the extension Lambda Layer. You will also need to add your Datadog API key to the function's environment variable section. As a best practice, we recommend using AWS Key Management Service or another secrets manager to store and encrypt your key.

If you use the Serverless Framework, the AWS Serverless Application Model (SAM), or AWS Cloud Development Kit (CDK) to deploy your applications, you can automatically send observability data from your Lambda functions to Datadog with Datadog's Serverless Framework, SAM, and CDK integrations.

Custom business metrics

Custom metrics give additional insights into use cases that are unique to your application workflows, such as a user logging into your application, purchasing an item, or updating a user profile.

The Lambda Library sends custom metrics asynchronously via the Datadog extension. Sending metrics asynchronously is recommended because it does not add any overhead to your code, making it an ideal solution for functions that power performance-critical tasks for your applications.

Datadog provides several libraries for instrumenting your functions, including Node.js, Python, Go, Ruby, and Java. To get started, import the appropriate Lambda Library methods and add a wrapper around your function, as seen in the example Node.js function snippet below:

1
const { datadog, sendDistributionMetric } = require("datadog-lambda-js");
2

3
async function customHandler(event, context) {
4
  sendDistributionMetric(
5
    "delivery_application.meal_value",       // Metric name
6
    13.54,                                                  // Metric value
7
    "item:pizza", "order:online"                // Associated tags
8
  );
9
  return {
10
    statusCode: 200,
11
    body: "Item purchased for delivery",
12
  };
13
}
14
// Wrap your handler function:
15
module.exports.customHandler = datadog(customHandler);

As the function code is invoked, the Lambda Library will automatically emit the delivery_application.meal_value metric to Datadog. You can read more about instrumenting your Lambda functions to send custom metrics in our documentation.

Enhanced metrics

Along with collecting custom metrics, you will also be able to analyze enhanced metrics from your Lambda functions (collected by Datadog's Lambda Library and Lambda extension). Enhanced metrics will show up in Datadog with the aws.lambda.enhanced prefix. These metrics are collected at higher granularity than standard CloudWatch metrics, enabling you to view metric data at close to real-time in Datadog. For example, while Lambda errors are available as a standard CloudWatch metric, you can create an alert on the enhanced metric (aws.lambda.enhanced.errors) to get higher-granularity insights into potential issues.

Some enhanced metrics (such as billed duration and estimated execution cost) are automatically extracted from your Lambda logs, eliminating the need to create custom queries in CloudWatch. Enhanced metrics also include detailed metadata for your functions such as cold_start and any custom tags you added to your function in the Lambda console.

View a heatmap of cold starts for your functions

Datadog uses enhanced metrics to automatically generate insights into your functions, so you can see which ones are performing poorly. For example, if a function is using too much memory, Datadog will flag it in the UI and provide more context, such as related traces and logs, for faster troubleshooting.

The Lambda Library can also trace requests across all your Lambda functions instrumented with Datadog's native tracing libraries and other systems running the Datadog Agent. In the next sections, we'll show you how to start collecting and analyzing Lambda traces.

Native tracing for AWS Lambda functions

Datadog APM provides tracing libraries that you can use with the Lambda Library in order to natively trace request traffic across your serverless architecture. In the example below, you can see the full path of a request as it travels across services in your environment.

View the full path of a request as it travels across services

The Lambda Library automatically propagates trace context across service boundaries, so you can get end-to-end visibility of all requests, even as they travel across hosts, containers, and AWS Lambda functions. Traces are sent asynchronously so they don't add any latency overhead to your serverless applications.

Configure tracing

Currently, Datadog APM includes native support for tracing Lambda functions written in Go, Java, Node.js, Ruby, Python, and .NET. To get started, you will need to set up (or upgrade) Datadog's Lambda Library and Lambda extension for your function. Datadog offers several methods for instrumenting your serverless applications—check out our documentation to select the one to best suit your needs. In the following example, we’ll use the Datadog CLI, which is the quickest way to get started, as it allows you to modify existing Lambda configurations without redeploying them. First, you’ll need to install the Datadog CLI client:

1
# NPM
2
npm install -g @datadog/datadog-ci
3
# Yarn
4
yarn global add @datadog/datadog-ci

After you’ve installed the CLI client, you need to ensure that it has access to your AWS credentials, a configured Datadog destination site for your telemetry, and a Datadog API key. We recommend saving your API key in AWS Secrets Manager for quick and secure access:

1
export AWS_ACCESS_KEY_ID="<ACCESS KEY ID>"
2
export AWS_SECRET_ACCESS_KEY="<ACCESS KEY>"
3
export DATADOG_SITE="<DD_SITE>" # such as datadoghq.com, datadoghq.eu, us3.datadoghq.com or ddog-gov.com
4
export DATADOG_API_KEY_SECRET_ARN="<DATADOG_API_KEY_SECRET_ARN>"

Once configured, you can instrument your function code with the following:

1
datadog-ci lambda instrument -f <FUNCTION_NAME> -f <ANOTHER_FUNCTION_NAME> -r <AWS_REGION> -v <LAYER_VERSION> -e <EXTENSION_VERSION>

For AWS SAM and AWS CDK infrastructure, you can also use Datadog's serverless macro to automatically collect traces from Lambda functions, without any instrumentation. Check out our documentation to learn more about using Datadog's macro or one of our native tracing libraries with your Lambda functions.

Explore your trace data

To start analyzing trace data from your serverless functions, navigate to Datadog's Serverless view, where you can view key function metrics alongside curated insights into function performance. Datadog provides visualizations you can customize to display the data you deem most important, such as iterator age, concurrent executions, and cold starts. You can also search for a group of functions with tags such as region, CloudFormation Stack name, and whether they were deployed to Lambda@Edge or as a Step Function.

View all your functions in the Serverless view

Clicking on a function shows you a full list of invocations, including key metrics, links to associated traces and logs, and insights, such as which invocations used over 95 percent of the function's allocated memory.

View traces and logs and key metrics for a single AWS Lambda function in the Serverless view

You can inspect an invocation to view its flame graph, associated tags, and JSON request and response payloads. Analyzing your Lambda function’s payloads can help you locate common sources of failure, such as missing parameters and incorrect resource addresses. You also have the option to prevent sensitive data, such as account IDs and addresses, from being sent to Datadog. To learn more about how to leverage the Serverless view to monitor your stack, check out our blog post.

Identify errors using request and response payloads.

End-to-end visibility into serverless applications

In addition to viewing the performance of individual functions, you need a high-level view of your entire microservice infrastructure in order to troubleshoot application issues. Datadog APM automatically generates a Service Map based on your trace data, so you can visualize all your Lambda functions in one place and understand the flow of traffic across microservices in your environment.

View a service map of your AWS Lambda functions and connected services

You can also analyze and explore your Lambda trace data with Trace Search and Analytics. By using any combination of tags, you can quickly filter down to a specific service or function. Trace Search and Analytics also uses the tags that are automatically created with Datadog's Lambda Library, so you can filter functions by tags such as cold_start:true. The graph below displays the top five functions with cold starts over time, broken down by function name. If you like, you can easily export this to an alert or dashboard.

Analyze your functions with Trace Search and Analytics

To effectively troubleshoot serverless applications, you need visibility into not only your Lambda functions, but also all the managed services that interact with them. Datadog provides distributed tracing for services that interact with Python and Node.js-based Lambda functions, including Amazon API Gateway, SQS, SNS, and Kinesis.

You can use the APM trace map to break down the path of your request as it flows through different services and Lambda functions. When your infrastructure is experiencing delays, the trace map gives you deep insight across the entire sequence of events so you can better identify sources of high latency and errors.

Follow the path of your request using the APM trace map.

So far, we've shown you how to collect and analyze data with Datadog's Lambda integration and Lambda Library. Now that all of your function data is flowing into Datadog, we'll explore how you can get more out of your data with Datadog's predictive monitoring and alerts.

Monitor AWS Lambda logs with Datadog

To submit logs via the Datadog's Lambda extension, simply set the DD_LOGS_ENABLED environment variable in your function to true. The extension will submit logs every ten seconds and at the end of each function invocation, enabling you to automatically collect log data without the need for any dedicated log forwarding infrastructure.

Search and analyze your Lambda logs

Datadog enables you to search on, analyze, and easily discover patterns in your logs. You can use identifiers such as the function's log group or name to search for your logs in the Log Explorer, as seen in the example below.

Explore your AWS Lambda logs in the Log Explorer

Lambda functions generate a large volume of logs, making it difficult to pinpoint issues during an incident or simply monitor the current state of your functions. You can use Log Patterns to help you surface interesting trends in your logs.

For example, if you notice a spike in Lambda errors on your dashboard, you can use Log Patterns to quickly search for the most common types of errors. In the example below, you can see a cluster of function logs for an AccessDeniedException permissions error. The logs provide a stack trace so you can troubleshoot further.

Quickly point out patterns in your AWS Lambda logs with Log Patterns

When you select a pattern, you can click on the View All button to pivot to the Log Explorer and inspect individual logs that exhibit that pattern, or you can analyze trends in your logs by clicking on the Graph button. For example, you can view the most invoked functions or a toplist of the most common function errors. You can then export the graph to a Lambda dashboard to monitor it alongside real-time performance data from your functions.

Visualize your logs with Log Analytics and export to a dashboard

Proactively monitor AWS Lambda with alerts

Once you're aggregating all your Lambda metrics, logs, and traces with Datadog, you can automatically detect anomalies and forecast trends in key Lambda metrics. You can also set up alerts to quickly find out about issues.

Forecast trends and detect anomalies in AWS Lambda functions

As mentioned earlier, Datadog generates enhanced metrics from your function code and Lambda logs that help you track data such as errors in near real time, memory usage, and estimated costs. You can apply anomaly detection to metrics like max memory used (e.g., aws.lambda.enhanced.max_memory_used) in order to see any unusual trends in memory usage.

View anomalies in memory usage for your Lambda functions

You can also apply a forecast to the estimated_cost metric to determine if your costs are expected to increase, based on historical data.

Forecast trends in your AWS Lambda functions

Alert on critical AWS Lambda metrics

Monitoring Lambda enables you to visualize trends and identify issues during critical outages, but it's easy to overlook an issue when you are monitoring a large volume of data in complex infrastructures. In order to ensure that you are aware of critical issues affecting your applications, you can create alerts to get notified about key issues detected in your Lambda metrics, logs, or traces.

Datadog provides a list of built-in alerts you can enable from the Serverless view to automatically notify you of critical performance issues with minimal configuration, such as a sudden increase in cold starts or out-of-memory errors.

Use built-in serverless alerts to notify you on critical issues with your functions

There are also several alert types you can use for creating custom alerts that fit your specific use case, so you can be notified about only the issues you care about. For example, you can create an alert to notify you if a function has been throttled frequently over a specific period of time. If you configure the alert to automatically trigger separate notifications per affected function, this saves you from creating duplicate alerts and enables you to get continuous, scalable coverage of your environment, no matter how many functions you're running.

Throttles occur when there is not enough capacity for a function, either because available concurrency is used up or because requests are coming in faster than the function can scale. You can use an alert to notify you if you are reaching the threshold of concurrent executions for your account or per region, as seen below.

View the status of all of your Lambda alerts

Start monitoring AWS Lambda with Datadog

In this post, we've looked at how to get deep visibility into all your AWS Lambda functions with Datadog. Once you integrate Lambda with Datadog, you can monitor the performance of your serverless applications, and optimize your functions by analyzing concurrency utilization, memory usage execution costs, and other metrics. And, if you use Lambda@Edge with Amazon CloudFront, Step Functions, or AppSync on top of your Lambda functions, you can automatically pull in monitoring data from those services with Datadog's built-in integrations. Check out our AWS documentation for more information.

If you don’t yet have a Datadog account, sign up for a free 14-day trial to start monitoring your AWS Lambda functions today.