Datadog Serverless Monitoring for Amazon API Gateway, SQS, Kinesis, and More | Datadog

Datadog Serverless Monitoring for Amazon API Gateway, SQS, Kinesis, and more

Author Alex Cuoci
Author Mallory Mooney

Published: February 11, 2022

Many organizations leverage AWS to build fully managed, event-driven applications, which break down complex workloads into APIs, event streams, and other decentralized services in order to improve performance and scalability. This type of architecture relies primarily on AWS Lambda functions to process synchronous and asynchronous requests as they move between a workload’s resources, such as Amazon API Gateway and Amazon Kinesis.

Datadog Serverless Monitoring already provides distributed tracing for functions to bring you detailed, real-time insights into your Lambda-based applications. Today, we’re building on our existing serverless monitoring capabilities to bring that same level of visibility to the rest of the AWS managed services that interact with your Python and Node.js Lambda functions.

Review alerts directly from Datadog APM
Create an alert when the age of the oldest message in an SQS queue suddenly increases, directly from Datadog APM.

We’re rolling out support for Amazon API Gateway, SQS, SNS, Kinesis, EventBridge, S3, and DynamoDB, so you can now:

  • detect and alert on increases in latency and errors for managed APIs, queues, and data stores
  • pivot directly from AWS service alerts to associated high-latency or error traces for faster troubleshooting
  • visualize the relationship between each of your fully managed services in the APM trace map
  • correlate latency or error traces with a service’s performance metrics in one place

Collectively, these capabilities give you multiple entry points into any request’s path as it flows across AWS services and Lambda functions, so you have full visibility into your event-driven workloads and can identify and troubleshoot performance issues at their source.

End-to-end visibility for serverless applications

View the trace's flame graph for more context
Visualize a request's full path, from the initial API Gateway endpoint that invokes a function to the downstream SNS topic and SQS queue that pass the associated event to a consuming function for processing.

When your serverless application’s performance starts to decline, you need to know exactly where the breakdown occurs—and which AWS services are involved—in order to resolve the problem before it becomes more serious. You can already pivot from Lambda alerts to a corresponding trace in order to troubleshoot issues with individual functions. Our updates expand on this functionality by allowing you to move from an alert for any managed AWS service to associated high-latency and error traces. Now, you have even more ways to investigate the source of user-facing latency and other issues that can occur anywhere in an asynchronous and synchronous request’s path.

For example, if Datadog detects increased latency in an SQS message queue, you can pivot from the triggered alert directly to related traces for faster troubleshooting.

Pivot from an alert to associated high-latency traces

As you’re inspecting a trace, you can use the flame graph and its context-rich spans to break down the time a request spent interacting with each of your AWS services and functions in order to identify the root cause of a triggered alert. Selecting an individual span enables you to view more details about the service’s configuration during the time of the request.

You can also use the new trace map—similar to Datadog’s request flow map—to visualize a trace in its entirety, giving you a better understanding of the lineage of AWS resources processing your requests.

View all services involved in a request with the Trace Map

Quickly resolve request latency and errors

High latency and errors are common performance issues that can occur at any point in a request’s path within an event-driven workload, regardless of whether the request is traversing through a Lambda function or another fully managed AWS service. Datadog gives you the ability to correlate traces with key performance metrics from each of your AWS services directly in the trace view, so you can determine if a misconfigured service is driving increased errors or latency.

For example, you can compare a high-latency trace for an Amazon SQS queue to the age of the queue’s oldest message. If the age has also increased, you may need to resolve application errors in the consumer or scale consumers in order to allow a queue to process messages more efficiently.

View integration metrics directly in the trace view

You can also use Trace Analytics to monitor a service’s performance after you applied a fix, enabling you to verify that a change is working as expected. For example, you can confirm that increasing the number of consumers for an SQS queue improved request latency over time. You can follow these same steps to resolve similar issues with how and where other AWS services process events. For instance, you might:

  • deploy services like Amazon API Gateway and their associated Lambda functions in the same availability zone
  • publish Amazon SNS, EventBridge, and S3 events in batches
  • modify the batch size and processing window for Amazon Kinesis data streams and SQS queues
  • add new partitions to (or re-partition) DynamoDB streams

These simple configuration changes can significantly improve the performance of your serverless applications and reduce costs.

Monitor your serverless workloads and AWS managed services

Datadog provides full visibility into all of the individual components that support a serverless application—from the managed services running event-driven workloads to their associated Lambda functions.

View all of your functions in the serverless view

Our new capabilities build on the insights you already get from Datadog APM and native tracing, enabling you to quickly identify the root cause of a performance issue anywhere in your serverless architecture. If you have already set up AWS serverless tracing, you can upgrade your Lambda Library to v52 for Python and v69 for Node.js. Check out our documentation to learn more about our AWS integrations and distributed tracing for serverless applications. If you don’t already have a Datadog account, you can sign up for a .