Monitor AWS Batch on Fargate With Datadog | Datadog

Monitor AWS Batch on Fargate with Datadog

Author Kennon Kwok
Author Jordan Obey

Published: June 7, 2024

AWS Batch on Fargate is an AWS offering that combines the benefits of AWS Fargate—a serverless compute engine for deploying and managing containers—with AWS Batch, a fully managed service for running batch workloads. Leveraging a pay-per-use pricing model and automatic scaling, AWS Batch on Fargate provides you with a cost-effective and scalable solution for running batch computing workloads without needing to worry about managing any underlying infrastructure.

AWS Batch on Fargate enables you to run compute-intensive batch processing tasks on serverless containers, making it ideal for workloads such as machine learning, data processing, and scientific computing, as well as automated job scheduling and serverless workflows.

With AWS Batch support for multi-container jobs now generally available, we’re happy to announce support for running the Datadog Agent in AWS Batch on Fargate. Datadog customers can expect the same level of observability for AWS Batch on Fargate that’s already available for other Fargate workloads, with the Agent container running on the task alongside applications.

In this post, we’ll cover how you can monitor metrics, traces, and live processes from AWS Batch on Fargate to ensure the health and performance of your workloads.

Collect and visualize metrics from AWS Batch on Fargate

Having the Datadog Agent run in an AWS Batch job on Fargate enables comprehensive monitoring of your containerized applications and jobs. It collects real-time, high-resolution CPU, memory, disk I/O, and network metrics. For example, you might have configured a CPU reservation for AWS Batch jobs running on Fargate, and to make sure you are not overtaxing your containers’ resources you can set an alert to notify you if your AWS Batch jobs’ CPU utilization passes a set threshold.

Additionally, the Agent container can accept DogStatsD metrics, providing a flexible and scalable method of submitting custom application metrics to Datadog.

Trace AWS Batch on Fargate jobs

The Datadog Agent, running as a container alongside your application containers, collects the trace data emitted by your instrumented applications. In the Datadog platform, you can visualize the collected traces as flame graphs, which show all service calls that make up a request. This helps you identify latency issues and errors across your distributed application. Additionally, Datadog’s Service Map provides a visual representation of your application’s architecture so you can understand service dependencies and optimize performance.

View all key Event Table logs with our out-of-the-box dashboard.

Monitor AWS Batch on Fargate live processes

Monitoring live processes in AWS Batch jobs on Fargate with Datadog provides valuable insights and capabilities for ensuring the health and performance of your serverless containerized applications.

Datadog Live Processes allows you to see every process running across all your AWS Batch jobs in Fargate, enabling you to monitor their resource metrics like CPU and memory usage. You can isolate processes causing crashes, latency, or resource contention within your Fargate containers, helping you quickly troubleshoot and resolve performance bottlenecks. And Datadog’s Watchdog feature helps detect and alert on anomalous process behavior, such as unexpected resource consumption or suspicious processes running on your serverless containers.

View all key Event Table logs with our out-of-the-box dashboard.

Start monitoring your AWS Batch workloads

Gain real-time visibility into your AWS Batch environments with the Datadog Agent container to help you and your team quickly detect and investigate issues affecting application performance and serverless infrastructure. For more information see our documentation to configure the Agent on your AWS Batch workloads.

If you’re not using Datadog yet, sign up for a .