How to Monitor Docker Resource Metrics | Datadog

How to monitor Docker resource metrics

Author K Young

Published: 11月 11, 2015

This post is part 2 in a 4-part series about monitoring Docker. Part 1 discusses the novel challenge of monitoring containers instead of hosts, part 3 covers the nuts and bolts of collecting Docker resource metrics, and part 4 describes how the largest TV and radio outlet in the U.S. monitors Docker. This article describes in detail the resource metrics that are available from Docker.

Docker is like a host

As discussed in part 1 of this series, Docker can rightly be classified as a type of mini-host. Just like a regular host, it runs work on behalf of resident software, and that work uses CPU, memory, I/O, and network resources. However, Docker containers run inside cgroups which don’t report the exact same metrics you might expect from a host. This article will discuss the resource metrics that are available. The next article in this series covers three different ways to collect Docker resource metrics.

Key Docker resource metrics

CPU

NameDescriptionMetric type
user CPUPercent of time that CPU is under direct control of processesResource: Utilization
system CPUPercent of time that CPU is executing system calls on behalf of processesResource: Utilization
throttling (count)Number of CPU throttling enforcements for a containerResource: Saturation
throttling (time)Total time that a container's CPU usage was throttledResource: Saturation

Standard metrics

Just like a traditional host, Docker containers report system CPU and user CPU usage. It probably goes without saying that if your container is performing slowly, CPU is one of the first resources you’ll want to look at.

As with all Docker resource metrics, you will typically collect the metrics differently than you would from an ordinary host. Another key difference with containers: unlike a traditional host, Docker does not report nice, idle, iowait, or irq CPU time.

Throttling

If Docker has plenty of CPU capacity, but you still suspect that it is compute-bound, you may want to check a container-specific metric: CPU throttling.

If you do not specify any scheduling priority, then available CPU time will be split evenly between running containers. If some containers don’t need all of their allotted CPU time, then it will be made proportionally available to other containers.

You can optionally control the share of CPU time each container should have relative to others using the same CPU(s) by specifying CPU shares.

Going one step further, you can actively throttle a container. In some cases, a container’s default or declared number of CPU shares would entitle it to more CPU time than you want it to have. If, in those cases, the container attempts to actually use that CPU time, a CPU quota constraint will tell Docker when to throttle the container’s CPU usage. Note that the CPU quota and CPU period are both expressed in microseconds (not milliseconds nor nanoseconds). So a container with a 100,000 microsecond period and a 50,000 microsecond quota would be throttled if it attempted to use more than half of the CPU time during its 0.1s periods.

Docker can tell you the number of times throttling was enforced for each container, as well as the total time that each container was throttled.

As discussed in the next article, CPU metrics can be collected from pseudo-files, the stats command (basic CPU usage metrics), or from the API.

Docker metrics visual break

Memory

Just as you would expect, Docker can report on the amount of memory available to it, and the amount of memory it is using.

NameDescriptionMetric type
MemoryMemory usage of a containerResource: Utilization
RSSNon-cache memory for a process (stacks, heaps, etc.)Resource: Utilization
Cache memoryData from disk cached in memoryResource: Utilization
SwapAmount of swap space in useResource: Saturation

Used memory can be decomposed into:

  • RSS (resident set size) is data that belongs to a process: stacks, heaps, etc. RSS itself can be further decomposed into active and inactive memory (active_anon and inactive_anon). Inactive RSS memory is swapped to disk when necessary.
  • cache memory reflects data stored on disk that is currently cached in memory. Cache can be further decomposed into active and inactive memory (active_file, inactive_file). Inactive memory may be reclaimed first when the system needs memory.

Docker also reports on the amount of swap currently in use.

Additional metrics that may be valuable in investigating performance or stability issues include page faults, which can represent either segmentation faults or fetching data from disk instead of memory (pgfault and pgmajfault, respectively).

Deeper documentation of memory metrics is here.

As with a traditional host, when you have performance problems, some of the first metrics you’ll want to look at include memory availability and swap usage.

As discussed in the next article, memory metrics can be collected from pseudo-files, the stats command (basic memory usage metrics), or from the API.

Docker metrics visual break

I/O

For each block device, Docker reports the following two metrics, decomposed into four counters: by reads versus writes, and by synchronous versus asynchronous I/O.

NameDescriptionMetric type
I/O servicedCount of I/O operations performed, regardless of sizeResource: Utilization
I/O service bytesBytes read or written by the cgroupResource: Utilization

Block I/O is shared, so it is a good idea to track the host’s queue and service times in addition to the container-specific I/O metrics called out above. If queue lengths or service times are increasing on a block device that your container uses, your container’s I/O will be affected.

As discussed in the next article, I/O metrics can be collected from pseudo-files, the stats command (bytes read and written), or from the API.

Network

Just like an ordinary host, Docker can report several different network metrics, each of them divided into separate metrics for inbound and outbound network traffic:

NameDescriptionMetric type
BytesNetwork traffic volume (send/receive)Resource: Utilization
PacketsNetwork packet count (send/receive)Resource: Utilization
Errors (receive)Packets received with errorsResource: Error
Errors (transmit)Errors in packet transmissionResource: Error
DroppedPackets dropped (send/receive)Resource: Error

As discussed in the next article, network metrics can be collected from pseudo-files, the stats command (bytes sent and received), or from the API.

Docker metrics visual break

Up next

Docker can report all the basic resource metrics you’d expect from a traditional host: CPU, memory, I/O, and network. However, some specific metrics you might expect (such as nice, idle, iowait, or irq CPU time) are not available, and others metrics are unique to containers, such as CPU throttling.

The commands used to collect resource metrics from Docker are different from the commands used on a traditional host, so the next article in this series covers the three main approaches to Docker resource metrics collection. Read on…


Source Markdown for this post is available on GitHub. Questions, corrections, additions, etc.? Please let us know.