Monitor Docker on AWS ECS

Monitor Docker on AWS ECS

/ / / /
Published: September 30, 2015
Default Docker dashboard

You probably have heard of Docker—it is a young container technology with a ton of momentum. But if you haven’t, you can think of containers as easily-configured, lightweight VMs that start up fast, often in under one second. Containers are ideal for microservice architectures and for environments that scale rapidly or release often.

While Docker is young, it is maturing fast, and is on track to become one of the most important computing technologies of the decade. Docker recently took one more step towards maturity with AWS’s release of EC2 Container Service (ECS). ECS is a service that automatically manages your Docker containers for you; it balances load among containers, recovers unhealthy containers, provides scaling automation, and more. You can think of ECS as a competitor to Kubernetes, provided as a service.

Containers are hard to track

visual break

Containers come and go rapidly, which is great for scalable or fast-evolving infrastructure. But containers’ short life also makes them quite hard to monitor.

For one thing, your monitoring tools must automatically detect the presence of new containers, and begin collecting their metrics. They also need to report service-level health via flexible aggregations. Finally, since going offline is ordinary for containers, your monitoring tool must not panic when this happens—otherwise the resulting sea of alerts will drown out important notifications.

Because you can’t run a reliable service that you can’t see, these monitoring challenges should be addressed before using Docker in production.

Deep visibility into ECS clusters with Datadog

Datadog is purpose-built to monitor highly dynamic infrastructure, including containers. To deepen our support for containers, Datadog collaborated with AWS engineers to create a tailored ECS integration.

Datadog understands the difference between pets and cattle, so when ECS brings new containers online, Datadog automatically begins tracking their metrics. When the containers go offline, Datadog handles that gracefully too.

But if your Docker deployment has service-level problems, Datadog will notice, and send you alerts. Docker metrics can be viewed individually, in dynamic groups, or correlated with metrics from the rest of your infrastructure.

Not only can Datadog track your containers, but it also can track what’s running inside of them. Datadog has over 200 built-in integrations for standard software and services, and can track your custom applications, too.

Connect Datadog to ECS

If you don’t have a Datadog account, you can now.

There are four setup steps, described in detail here.

  1. Create an ECS cluster or reuse one of your existing clusters.
  2. Define an ECS task to install and run the Datadog Agent in one container on each ECS host. The Agent will collect resource usage metrics from other containers running on the host, and the host itself: CPU, memory, I/O, network.
  3. Create an IAM policy to give the Agent permission to collect metrics, and to allow the agent to be started automatically when the host launches.
  4. Add a user script to your ECS-managed EC2 instances to run the Agent-startup task described in step #1 (above) when the host is launched and after a reboot.

See your fleet

Within minutes, you will see your Docker metrics flowing into your default Docker dashboard.

You’ll immediately see your:

  • CPU usage including when CPU is throttled by Docker
  • Memory metrics including usage, swap, and faults
  • I/O including reads, writes
  • Network statistics including throughput, drops, and errors

Correlate metrics from Docker and other systems

No system is an island, so when investigating problems, it is important to be able to compare and correlate metrics from any part of your infrastructure. With Datadog you can zoom to any point in history, correlate any metrics you want, and with full granularity slice your infrastructure by different dimensions on the fly.

AWS ECS CPU correlation with DB locks

Know immediately about any problems

If your Docker infrastructure ever slows down or experiences a high error rate, you probably want to know right away. Datadog lets you set expressive and flexible service-level alerts that can be configured to contact humans with proper level of urgency for any situation.

ECS alert

To the future, Marty

No matter how many containers ECS is managing, or how fast it scales up or down, Datadog will track everything running in your dynamic infrastructure. With Datadog you’ll know exactly what’s happening at your service level, with the containers powering the service, and the software running inside each container. You can start using Datadog now with a .


Want to write articles like this one? Our team is hiring!