This post is part 2 in a 4-part series about monitoring Docker. Part 1 discusses the novel challenge of monitoring containers instead of hosts, part 3 covers the nuts and bolts of collecting Docker resource metrics, and part 4 describes how the largest TV and radio outlet in the U.S. monitors Docker. This article describes in detail the resource metrics that are available from Docker.
Docker is like a host
As discussed in part 1 of this series, Docker can rightly be classified as a type of mini-host. Just like a regular host, it runs work on behalf of resident software, and that work uses CPU, memory, I/O, and network resources. However, Docker containers run inside cgroups which don’t report the exact same metrics you might expect from a host. This article will discuss the resource metrics that are available. The next article in this series covers three different ways to collect Docker resource metrics.
Key Docker resource metrics
|user CPU||Percent of time that CPU is under direct control of processes||Resource: Utilization|
|system CPU||Percent of time that CPU is executing system calls on behalf of processes||Resource: Utilization|
|throttling (count)||Number of CPU throttling enforcements for a container||Resource: Saturation|
|throttling (time)||Total time that a container's CPU usage was throttled||Resource: Saturation|
Just like a traditional host, Docker containers report system CPU and user CPU usage. It probably goes without saying that if your container is performing slowly, CPU is one of the first resources you’ll want to look at.
As with all Docker resource metrics, you will typically collect the metrics differently than you would from an ordinary host. Another key difference with containers: unlike a traditional host, Docker does not report nice, idle, iowait, or irq CPU time.
If Docker has plenty of CPU capacity, but you still suspect that it is compute-bound, you may want to check a container-specific metric: CPU throttling.
If you do not specify any scheduling priority, then available CPU time will be split evenly between running containers. If some containers don’t need all of their allotted CPU time, then it will be made proportionally available to other containers.
You can optionally control the share of CPU time each container should have relative to others using the same CPU(s) by specifying CPU shares.
Going one step further, you can actively throttle a container. In some cases, a container’s default or declared number of CPU shares would entitle it to more CPU time than you want it to have. If, in those cases, the container attempts to actually use that CPU time, a CPU quota constraint will tell Docker when to throttle the container’s CPU usage. Note that the CPU quota and CPU period are both expressed in microseconds (not milliseconds nor nanoseconds). So a container with a 100,000 microsecond period and a 50,000 microsecond quota would be throttled if it attempted to use more than half of the CPU time during its 0.1s periods.
Docker can tell you the number of times throttling was enforced for each container, as well as the total time that each container was throttled.
Just as you would expect, Docker can report on the amount of memory available to it, and the amount of memory it is using.
|Memory||Memory usage of a container||Resource: Utilization|
|RSS||Non-cache memory for a process (stacks, heaps, etc.)||Resource: Utilization|
|Cache memory||Data from disk cached in memory||Resource: Utilization|
|Swap||Amount of swap space in use||Resource: Saturation|
Used memory can be decomposed into:
- RSS (resident set size) is data that belongs to a process: stacks, heaps, etc. RSS itself can be further decomposed into active and inactive memory (
inactive_anon). Inactive RSS memory is swapped to disk when necessary.
- cache memory reflects data stored on disk that is currently cached in memory. Cache can be further decomposed into active and inactive memory (
inactive_file). Inactive memory may be reclaimed first when the system needs memory.
Docker also reports on the amount of swap currently in use.
Additional metrics that may be valuable in investigating performance or stability issues include page faults, which can represent either segmentation faults or fetching data from disk instead of memory (
Deeper documentation of memory metrics is here.
As with a traditional host, when you have performance problems, some of the first metrics you’ll want to look at include memory availability and swap usage.
For each block device, Docker reports the following two metrics, decomposed into four counters: by reads versus writes, and by synchronous versus asynchronous I/O.
|I/O serviced||Count of I/O operations performed, regardless of size||Resource: Utilization|
|I/O service bytes||Bytes read or written by the cgroup||Resource: Utilization|
Block I/O is shared, so it is a good idea to track the host’s queue and service times in addition to the container-specific I/O metrics called out above. If queue lengths or service times are increasing on a block device that your container uses, your container’s I/O will be affected.
Just like an ordinary host, Docker can report several different network metrics, each of them divided into separate metrics for inbound and outbound network traffic:
|Bytes||Network traffic volume (send/receive)||Resource: Utilization|
|Packets||Network packet count (send/receive)||Resource: Utilization|
|Errors (receive)||Packets received with errors||Resource: Error|
|Errors (transmit)||Errors in packet transmission||Resource: Error|
|Dropped||Packets dropped (send/receive)||Resource: Error|
Docker can report all the basic resource metrics you’d expect from a traditional host: CPU, memory, I/O, and network. However, some specific metrics you might expect (such as nice, idle, iowait, or irq CPU time) are not available, and others metrics are unique to containers, such as CPU throttling.
The commands used to collect resource metrics from Docker are different from the commands used on a traditional host, so the next article in this series covers the three main approaches to Docker resource metrics collection. Read on…