In 2021, we partnered with AWS to develop the Datadog Lambda extension which provides a simple, cost-effective way for teams to collect traces, logs, custom metrics, and enhanced metrics from Lambda functions and submit them to Datadog. Now, AWS has updated their Lambda Logs API with the release of the Lambda Telemetry API, which expands the volume of observability data available for collection and enables the latest version of the Datadog Lambda extension to provide even deeper visibility into the performance of your Lambda functions.
In this post, we’ll look at how the AWS Lambda Telemetry API expands the capabilities of the Datadog Lambda extension by:
- visualizing the impact of cold starts with cold start trace spans
- providing deeper insight into Lambda function performance with new enhanced metrics
When Lambda functions experience a spike in traffic and scale out or are invoked after a long idle period, they may experience a latency increase referred to as a cold start, which can significantly degrade end-user experience. Datadog already automatically flags functions invoked with a cold start with a
cold_start tag and notifies you whenever cold starts occur at an anomalously high rate.
With additional data available through the Lambda Telemetry API, Datadog now also visualizes cold starts as spans within a serverless trace so you can see the impact of cold starts on your Lambda functions. For example, if you are exploring the Serverless View and notice a function invocation is tagged with a cold start insight, you can navigate to that function’s associated trace to view the duration of that cold start and determine its severity and overall impact. You can then decide whether it’s necessary to take remediating steps such as allocating more memory to your functions or enabling provisioned concurrency.
The Lambda Telemetry API enables the Datadog Lambda extension to provide additional enhanced metrics such as
ProducedBytes for further end-to-end visibility into the execution of your Lambda functions. These additional metrics complement our current library of enhanced metrics by giving you a fuller view of the Lambda execution lifecycle. So now, in addition to monitoring and alerting on previously available enhanced metrics—such as
aws.lambda.enhanced.errors—you can monitor and alert on the size of a function response (
aws.lambda.enhanced.produced_bytes), the time in milliseconds from when an invocation request is received to when the first byte of a response is sent to a client(
aws.lambda.response_latency), and the elapsed time in milliseconds between when the first and last byte of a response is sent to a client (
These new enhanced metrics are available in the Serverless View and the out-of-the-box AWS Lambda (Enhanced Metrics) dashboard and will help you spot and troubleshoot Lambda function performance issues. For example, even if no cold starts are detected, you might see an unusual spike in the time it takes a Lambda function to fully execute its response to a request—prompting you to investigate further and remediate.
With the release of the Lambda Telemetry API, the Datadog Lambda extension now provides you with an expanded look into the performance of Lambda functions through cold start trace spans and additional enhanced metrics. You can monitor this expanded Lambda performance telemetry alongside the CloudWatch Lambda metrics already available through our AWS integration, as well as logs from other AWS services such as Amazon S3, API Gateway, and DynamoDB via the Datadog Forwarder function.
To update to the latest version of the Datadog Lambda extension, click here.
If you’re not already a Datadog customer, sign up today for a 14-day free trial.