Dash conference! July 11-12, NYC

Docker Dashboard

What is Docker?

Docker is a popular and fast-evolving technology that has become nearly synonymous with “containerization.” A container is a lightweight virtual runtime, which provides software isolation, among other benefits. You can think of containers as easily-configured, lightweight VMs that start up fast, often in under one second. Docker and other container technologies are ideal for microservice architectures and for environments that scale rapidly or release often.

Docker overview dashboard

When building a dashboard to monitor your Docker containers, you should include key resource metrics pertaining to CPU, memory, I/O, and network. Below is a snapshot of the customizable Docker dashboard in Datadog, which automatically populates with metrics when you integrate Datadog with Docker. Even if you are not a Datadog user, the dashboard’s contents should provide a template for monitoring your Docker containers’ resource utilization and identifying any potential resource constraints in your containerized infrastructure.

docker-dash

Here’s a widget-by-widget breakdown of the graphs and query values in this dashboard.

Events

Event timeline

The horizontal timeline immediately below the Docker logo shows how many Docker events have occurred over the past day. Events are any discrete occurrence reported by Docker, such as a container’s creation, startup, or destruction. This bar graph is useful for identifying any anomalous spikes in recent Docker activity.

Event stream

The Docker event stream captures more detailed information on each Docker event, such as the name of the host, the name of the Docker image, and the exact nature of the event.

Containers

Running container change

This query value widget shows how many containers are running now, as compared to five minutes ago. This widget will alert you to sudden changes in the number of running Docker containers—if the container count decreases by more than 20 percent in five minutes, the widget turns yellow; if the container count drops by 50 percent or more, the widget turns red.

Running containers

This query-value widget displays the number of Docker containers currently running. A breakdown of this total is available in the bar graph below or in the toplist to the right of the query widget.

Stopped containers

This query-value widget displays the number of Docker containers that have been stopped but not deleted.

Running containers by image (graph)

This stacked bar graph shows the change in the total number of running containers over the past hour. The graph also breaks down the running containers by image, so that you can see if any particular image is dominating the container count or has experienced dramatic population changes in the past hour.

Running containers by image (list)

This toplist ranks the top 20 images, in terms of the number of containers currently running. This list is useful for identifying any image or service that has an unexpectedly large footprint, which may be due to faulty configuration or container management.

CPU

CPU user by image

If your container is performing slowly, CPU is likely one of the first resources you’ll want to look at. CPU user time is an especially valuable metric, as it represents the percentage of time that CPU is under direct control of processes. This timeseries graph shows the average CPU user time for each Docker image over the past hour to help you identify whether any particular image is CPU-bound.

CPU system by image

The CPU system time metric represents the percentage of time that CPU is executing system calls on behalf of processes.

Most CPU-intensive containers

This list shows which containers (as opposed to images) have had the highest peak levels of CPU user time over the past hour. Note that if you have multiple containers with the same name (for example, identical containers on different hosts), this ranking will display the average of the CPU metrics across those containers.

CPU by container

This heatmap shows the evolution of CPU usage over the past hour, broken down by container name. This graph type allows you to see, at a glance, the overall range of your containers’ CPU usage, as well as whether there are any outliers. Note that if you have multiple containers with the same name (for example, identical containers on different hosts), this ranking will display the average of the CPU metrics across those containers.

Memory

RSS memory by image

RSS (resident set size) is the amount of non-cache data that belongs to a process: stacks, heaps, etc. This timeseries breaks down the RSS memory by image to help identify which images are the most memory-intensive, and which may be memory-constrained.

Swap by image

When a container needs to free up memory, it can swap inactive RSS memory to disk as necessary. Low levels of swapping can be tolerated in some applications, but because reading from disk can be orders of magnitude slower than reading from memory, significant swap usage can cause serious performance problems. This timeseries shows how much swap space is in use, on average, for each Docker image.

Cache memory by image

This timeseries shows the cache memory usage of each Docker image. Cache memory, distinct from RSS memory, reflects data stored on disk that is currently cached in memory. Inactive cache may be reclaimed first when the system needs memory.

Most RAM-intensive containers

This list shows which containers (as opposed to images) have had the highest peak levels of RSS memory usage over the past hour. Note that if you have multiple containers with the same name (for example, identical containers on different hosts), this ranking will display the average of the memory metrics across those containers.

Memory by container

This heatmap shows the evolution of RSS memory usage over the past hour, broken down by container name. This graph type allows you to see, at a glance, the overall range of your containers’ RAM usage, as well as whether there are any memory-hungry outliers. Note that if you have multiple containers with the same name (for example, identical containers on different hosts), this ranking will display the average of the memory metrics across those containers.

Network

Avg. rx bytes by image

This graph shows the inbound network traffic for each Docker image over the past hour. This graph and the one below are useful for identifying containerized services that may be network-limited, or that may be suffering performance problems due to transient network issues.

Avg. tx bytes by image

This graph shows the outbound network traffic for each Docker image over the past hour. This graph and the one above are useful for identifying containerized services that may be network-limited, or that may be suffering performance problems due to transient network issues.

Most tx-intensive containers

This list shows which containers (as opposed to images) have had the highest peak levels of network transmission over the past hour. Note that if you have multiple containers with the same name (for example, identical containers on different hosts), this ranking will display the average of the network metrics across those containers.

tx by container

This heatmap shows the evolution of outbound network throughput over the past hour, broken down by container name. This graph type allows you to see, at a glance, the overall range of your containers’ network transmission rates, as well as whether there are any network-saturating outliers. Note that if you have multiple containers with the same name (for example, identical containers on different hosts), this ranking will display the average of the memory metrics across those containers.

I/O

Avg. I/O bytes read by image

This timeseries shows how many bytes have been read from disk by each Docker image over the past hour. I/O for each block device is shared, so you should investigate host-level I/O metrics if you notice anomalies at the container level. Your container’s I/O rates will be affected if queue lengths or service times are increasing on a block device that your container uses.

Avg. I/O bytes written by image

This timeseries shows how many bytes have been written to disk by each Docker image over the past hour. I/O for each block device is shared, so you should investigate host-level I/O metrics if you notice anomalies at the container level. Your container’s I/O rates will be affected if queue lengths or service times are increasing on a block device that your container uses.

Monitor your containers with the Docker dashboard

If you’d like to see your Docker metrics and events on this dashboard, you can try Datadog for free for 14 days. The dashboard will populate with metrics automatically after you set up the integration with Docker.

For a deep dive on Docker metrics and how to monitor them, check out our four-part How to Monitor Docker series.

Docker Dashboard