Monitor Amazon ECS Anywhere With Datadog | Datadog

Monitor Amazon ECS Anywhere with Datadog

Author Paul Gottschling
Author Yair Cohen

Published: May 27, 2021

Amazon Elastic Container Service (ECS) is a managed compute platform for containers that was designed to be simple to configure, with opinionated defaults to help users get started quickly. ECS customers can run containerized workloads on either Amazon EC2 instances or the serverless Fargate platform without having to maintain a control plane—and can easily integrate ECS with other AWS resources, like Network Load Balancers, to architect their infrastructure.

AWS introduced ECS Anywhere so customers with on-premise data centers can take advantage of the ECS-managed control plane to run containerized environments using their existing infrastructure investments, ensure data security and compliance, and run containerized applications on edge devices. In an ECS Anywhere deployment, an ECS Anywhere Agent can run on any compute infrastructure (such as bare-metal servers or VMs) and communicates with the local Docker daemon as well as an AWS API. This makes it possible for ECS to orchestrate tasks and services using on-premise resources as well as in the cloud on EC2 and Fargate.

Datadog is proud to be a launch partner for ECS Anywhere. Using Datadog, you can get comprehensive visibility into your containerized applications—wherever they are deployed—so you can:

The architecture of a Datadog deployment within ECS Anywhere.

Modernize your on-prem workloads

Since ECS provides a managed control plane, ECS Anywhere is particularly useful for migrating on-premise workloads to an orchestrated, containerized environment. Datadog can help you track this migration to ensure that your newly deployed containers are configured correctly for your underlying resources—and that not-yet-containerized processes continue to run as expected.

Datadog’s APM and Distributed Tracing gives you deep insight into the performance of your applications, so you can easily tell when a service deployed using a new compute platform falls short of expectations. Our tracing libraries let you tag your applications by ECS task, task family, service, and cluster, so you can compare performance between your containerized and host-based versions of the same application.

In the example below, we’re using Datadog’s Live Analytics view to compare request durations between applications running in our ECS Anywhere cluster and those running without container-based process isolation. If our containerized applications show a higher-than-expected response latency or error rate, we will know to check for misconfigurations in our ECS setup (such as an underprovisioned task size).

Datadog's Live Analytics view helps you understand trends in the performance of your ECS Anywhere applications.

As you monitor application performance, you’ll also want to track the health and resource utilization of your workloads to ensure they run as expected on your infrastructure as you migrate to containers. Using Datadog’s Live Processes view, you can get down-to-the-second visibility into the health and performance of your running executables, both inside and outside of containers.

In this example, we have created a dashboard to visualize live process data for both containerized (ecs_cluster:scanner-iot-fleet) and non-containerized (ecs_cluster:none) versions of the same application. Since all processes within our infrastructure are using low levels of CPU and memory, we know that we have adequate resources to schedule further ECS Anywhere containers during sudden increases in demand.

Custom dashboard visualizing processes across infrastructure types.

You can also use Datadog’s Live Container view to get a closer look at the health and performance of our ECS Anywhere containers. Datadog pulls metadata from ECS as well as Docker, so you can easily group and filter your data by cluster, task, task family, and service, making it simple to know where availability or resource utilization issues are occurring in your infrastructure. For example, by showing resource utilization in our scanner-iot-fleet cluster, we can see that the scanner-hub container may be at risk of CPU saturation.

The Live Container view filtered by ECS Anywhere tags.

Keep an eye on devices at the edge

ECS Anywhere is well suited to running lightweight, containerized applications on devices at the edge of your on-premise network, such as IoT devices. As long as a device runs the ECS Anywhere Agent and has adequate resources, ECS can schedule your application containers there automatically—and execute locally loaded configurations temporarily during a network outage. Datadog helps ensure that your ECS-based edge applications are running as expected and makes it easier to spot unreliable network connectivity.

Using Datadog’s Network Map, you can get a full overview of the topology of your edge network and identify any devices affected by connectivity losses. You can use tags to have nodes in the Network Map represent ECS tasks, services, or containers, so you can quickly tell if edge applications have lost connectivity upstream—or if the ECS Agent has stopped communicating with the control plane. You can then use the Network Page to track network flows to and from a specific edge device over time, helping you determine when an issue took place and providing crucial context for your investigations. In this example, we are using the Network Page to monitor traffic between each ECS Agent container in our infrastructure and the AWS backend.

The Datadog Network Page showing an ECS Anywhere deployment.

To tell if ECS Anywhere Agents on edge devices are having trouble fetching configurations from the ECS backend, you can use Datadog’s Log Explorer to track your ECS Agent logs over time. The Datadog Agent can fetch logs from each ECS Agent container that runs on your host, meaning that you can quickly identify issues with connecting to the ECS API or scheduling tasks. Here we’re using the patterns aggregation in the Log Explorer to view common logs associated with the ECS Agent.

ECS Agent logs in the Log Explorer.

ECS is anywhere—and so is Datadog

Now that Datadog integrates with ECS Anywhere, you can get full visibility into your ECS clusters, no matter where you have deployed them. And with 450+ integrations, Datadog can help you monitor your entire on-premise or hybrid infrastructure, including your VMware vSphere VMs, Google Anthos applications, and Cisco Meraki devices.

To get started monitoring ECS Anywhere, follow these steps, which will run the Datadog Agent on one host in your infrastructure:

  1. Download the ECS task definition for the Datadog Agent, datadog-agent-ecs.json.

  2. Add the following field to your Datadog Agent task definition, which instructs ECS to deploy the Datadog Agent on your ECS Anywhere infrastructure (rather than on EC2 or Fargate):

datadog-agent-ecs.json

  "requiresCompatibilities": [
    "EXTERNAL"
  ]
  1. Replace <YOUR_DATADOG_API_KEY> with your Datadog API key in datadog-agent-ecs.json.

  2. Run the following commands to register the Datadog Agent task definition, launch the task on ECS Anywhere, and verify that the task is running:

export CLUSTER_NAME=<YOUR_ECS_CLUSTER_NAME>

# Register the task definition
aws ecs register-task-definition --cli-input-json file://datadog-agent-ecs.json

# Run the task
aws ecs run-task --cluster $CLUSTER_NAME --launch-type EXTERNAL --task-definition datadog-agent-task

# Get the Task ID
TEST_TASKID=$(aws ecs list-tasks --cluster $CLUSTER_NAME | jq -r '.taskArns[0]')

# Verify Task is Running
aws ecs describe-tasks --cluster $CLUSTER_NAME --tasks $TEST_TASKID

You can also add your Datadog Agent task to an ECS daemon service to ensure that it runs on each host in your ECS Anywhere infrastructure. For more information, consult the documentation on our ECS integration.

If you don’t have a Datadog account, you can sign up for a .