How to monitor Docker resource metrics

K Young

This post is part 2 in a 4-part series about monitoring Docker. Part 1 discusses the novel challenge of monitoring containers instead of hosts, part 3 covers the nuts and bolts of collecting Docker resource metrics, and part 4 describes how the largest TV and radio outlet in the U.S. monitors Docker. This article describes in detail the resource metrics that are available from Docker.

Docker is like a host

As discussed in part 1 of this series, Docker can rightly be classified as a type of mini-host. Just like a regular host, it runs work on behalf of resident software, and that work uses CPU, memory, I/O, and network resources. However, Docker containers run inside cgroups which don't report the exact same metrics you might expect from a host. This article will discuss the resource metrics that are available. The next article in this series covers three different ways to collect Docker resource metrics.

Key Docker resource metrics

CPU

Name	Description	Metric type
user CPU	Percent of time that CPU is under direct control of processes	Resource: Utilization
system CPU	Percent of time that CPU is executing system calls on behalf of processes	Resource: Utilization
throttling (count)	Number of CPU throttling enforcements for a container	Resource: Saturation
throttling (time)	Total time that a container's CPU usage was throttled	Resource: Saturation

Standard metrics

Just like a traditional host, Docker containers report system CPU and user CPU usage. It probably goes without saying that if your container is performing slowly, CPU is one of the first resources you'll want to look at.

As with all Docker resource metrics, you will typically collect the metrics differently than you would from an ordinary host. Another key difference with containers: unlike a traditional host, Docker does not report nice, idle, iowait, or irq CPU time.

Throttling

If Docker has plenty of CPU capacity, but you still suspect that it is compute-bound, you may want to check a container-specific metric: CPU throttling.

If you do not specify any scheduling priority, then available CPU time will be split evenly between running containers. If some containers don't need all of their allotted CPU time, then it will be made proportionally available to other containers.

You can optionally control the share of CPU time each container should have relative to others using the same CPU(s) by specifying CPU shares.

Going one step further, you can actively throttle a container. In some cases, a container's default or declared number of CPU shares would entitle it to more CPU time than you want it to have. If, in those cases, the container attempts to actually use that CPU time, a CPU quota constraint will tell Docker when to throttle the container's CPU usage. Note that the CPU quota and CPU period are both expressed in microseconds (not milliseconds nor nanoseconds). So a container with a 100,000 microsecond period and a 50,000 microsecond quota would be throttled if it attempted to use more than half of the CPU time during its 0.1s periods.

Docker can tell you the number of times throttling was enforced for each container, as well as the total time that each container was throttled.

As discussed in the next article, CPU metrics can be collected from pseudo-files, the stats command (basic CPU usage metrics), or from the API.

Memory

Just as you would expect, Docker can report on the amount of memory available to it, and the amount of memory it is using.

Name	Description	Metric type
Memory	Memory usage of a container	Resource: Utilization
RSS	Non-cache memory for a process (stacks, heaps, etc.)	Resource: Utilization
Cache memory	Data from disk cached in memory	Resource: Utilization
Swap	Amount of swap space in use	Resource: Saturation

Used memory can be decomposed into:

RSS (resident set size) is data that belongs to a process: stacks, heaps, etc. RSS itself can be further decomposed into active and inactive memory (active_anon and inactive_anon). Inactive RSS memory is swapped to disk when necessary.
cache memory reflects data stored on disk that is currently cached in memory. Cache can be further decomposed into active and inactive memory (active_file, inactive_file). Inactive memory may be reclaimed first when the system needs memory.

Docker also reports on the amount of swap currently in use.

Additional metrics that may be valuable in investigating performance or stability issues include page faults, which can represent either segmentation faults or fetching data from disk instead of memory (pgfault and pgmajfault, respectively).

Deeper documentation of memory metrics is here.

As with a traditional host, when you have performance problems, some of the first metrics you'll want to look at include memory availability and swap usage.

As discussed in the next article, memory metrics can be collected from pseudo-files, the stats command (basic memory usage metrics), or from the API.

I/O

For each block device, Docker reports the following two metrics, decomposed into four counters: by reads versus writes, and by synchronous versus asynchronous I/O.

Name	Description	Metric type
I/O serviced	Count of I/O operations performed, regardless of size	Resource: Utilization
I/O service bytes	Bytes read or written by the cgroup	Resource: Utilization

Block I/O is shared, so it is a good idea to track the host's queue and service times in addition to the container-specific I/O metrics called out above. If queue lengths or service times are increasing on a block device that your container uses, your container's I/O will be affected.

As discussed in the next article, I/O metrics can be collected from pseudo-files, the stats command (bytes read and written), or from the API.

Network

Just like an ordinary host, Docker can report several different network metrics, each of them divided into separate metrics for inbound and outbound network traffic:

Name	Description	Metric type
Bytes	Network traffic volume (send/receive)	Resource: Utilization
Packets	Network packet count (send/receive)	Resource: Utilization
Errors (receive)	Packets received with errors	Resource: Error
Errors (transmit)	Errors in packet transmission	Resource: Error
Dropped	Packets dropped (send/receive)	Resource: Error

As discussed in the next article, network metrics can be collected from pseudo-files, the stats command (bytes sent and received), or from the API.

Up next

Docker can report all the basic resource metrics you'd expect from a traditional host: CPU, memory, I/O, and network. However, some specific metrics you might expect (such as nice, idle, iowait, or irq CPU time) are not available, and others metrics are unique to containers, such as CPU throttling.

The commands used to collect resource metrics from Docker are different from the commands used on a traditional host, so the next article in this series covers the three main approaches to Docker resource metrics collection. Read on...

Get Started with Datadog