Monitoring Kubernetes performance metrics
This post is Part 2 of a 4-part series about Kubernetes monitoring. Part 1 discusses how Kubernetes changes your monitoring strategies, this post breaks down the key metrics to monitor, Part 3 covers the different ways to collect that data, and Part 4 details how to monitor Kubernetes performance with Datadog.
As explained in Part 1, using Kubernetes for container orchestration requires a rethinking of your monitoring strategy. But if you use the proper tools, know which metrics to track, and know how to interpret performance data, you will have good visibility into your containerized infrastructure and its orchestration. This part of the series digs into the different metrics you should monitor.
Where metrics come from
Heapster: Kubernetes’ own metrics collector
We cannot talk about Kubernetes metrics without introducing Heapster: it is for now the go-to source for basic resource utilization metrics and events from your Kubernetes clusters. On each node, cAdvisor collects data about running containers that Heapster then queries through the kubelet of the node. Part 3 of this series, which describes the different solutions to collect Kubernetes metrics, will give you more details on how Heapster works and how to configure it for that purpose.
Heapster vs. native container metrics
It’s important to understand that metrics reported by your container engine (Docker or rkt) can have different values than the equivalent metrics from Kubernetes. As mentioned above, Kubernetes relies on Heapster to report metrics instead of the cgroup file directly. And one of Heapster’s limitations is that it collects Kubernetes metrics at a different frequency (aka “housekeeping interval”) than cAdvisor, which makes the overall metric collection frequency for metrics reported by Heapster tricky to evaluate. This can lead to inaccuracies due to mismatched sampling intervals, especially for metrics where sampling is crucial to the value of the metric, such as counts of CPU time. That’s why you should really consider tracking metrics from your containers instead of from Kubernetes. Throughout this post, we’ll highlight the metrics that you should monitor. Even when you are using Docker metrics, however, you should still aggregate them using the labels from Kubernetes.
Now that we’ve made this clear, let’s dig into the metrics you should monitor.
Key performance metrics to monitor
Since Kubernetes plays a central role in your infrastructure, it has to be closely monitored. You’ll want to be sure that pods are healthy and correctly deployed, and that resource utilization is optimized.
In order to make sure Kubernetes does its job properly, you want to be able to check the health of pod deployments.
During a deployment rollout, Kubernetes first determines the number of desired pods required to run your application(s). Then it deploys the needed pods; the newly created pods are up and counted as current. But current pods are not necessarily available immediately for their intended use.
$ kubectl get deployments NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE nginx-deployment 3 3 3 3 18s
Indeed for some types of deployments, you might want to enforce a waiting period before making them available. Let’s say you have a Jenkins cluster where slaves are pods in Kubernetes. They need some time to start so you want to leave them unavailable during that initiation time and not have them handle any incoming requests. You can specify a delay in your PodSpec using
.spec.minReadySeconds, which will temporarily prevent your pods from becoming available. Note that readiness checks can be a better solution in some cases to make sure your pods are healthy before they receive requests (see section about health checks below).
During a rolling update, you can also specify in the PodSpec
.spec.strategy.rollingUpdate.maxUnavailable to make sure you always have at least a certain number (or percentage) of pods available throughout the process. You can also use
.spec.strategy.rollingUpdate.maxSurge to specify a cap on the number (or percentage) of extra pods that can be created beyond the desired pods.
|Metric||Metric name in kube-state-metrics||Description||Metric type|
|Desired pods||kube_deployment_spec_replicas||Number of pods desired when the deployment started||Other|
|Available pods||kube_deployment_status_replicas_available||Number of pods currently available||Other|
|Unavailable pods||kube_deployment_status_replicas_unavailable||Number of pods currently existing but not available||Other|
You should make sure the number of available pods always matches the desired number of pods outside of expected deployment transition phases.
|Current pods||Number of pods currently running||Resource: Utilization|
Keeping an eye on the number of pods currently running (by node or replica set, for example) will give you an overview of the evolution of your dynamic infrastructure.
To understand how the number of running pods impacts resource usage (CPU, memory, etc.) in your cluster, you should correlate this metric with the resource metrics described in the next section.
Monitoring system resources helps ensure that your clusters and applications remain healthy.
|Metric||Metric name in kube-state-metrics||Description||Metric type|
|CPU usage||-||Percentage of allocated CPU currently in use||Resource: Utilization|
|Node CPU capacity||kube_node_status_capacity_cpu_cores||Total CPU capacity of your cluster’s nodes||Resource: Utilization|
|Memory usage||-||Percentage of total memory in use||Resource: Utilization|
|Node Memory capacity||kube_node_status_capacity_memory_bytes||Total memory capacity of your cluster’s nodes||Resource: Utilization|
|Requests||-||Minimum amount of a given resource required for containers to run (should be summed over a node)||Resource: Utilization|
|Limits||-||Maximum amount of a given resource allowed to containers (should be summed over a node)||Resource: Utilization|
|Filesystem usage||-||Volume of disk being used (bytes)||Resource: Utilization|
|Disk I/O||-||Bytes read from or written to disk||Resource: Utilization|
CPU and memory
It probably goes without saying that when performance issues arise, CPU and memory usage are likely the first resource metrics you will want to review.
However, as explained in the first section of this post, to track memory and CPU usage you should favor the metrics reported by your container technology, such as Docker, rather than the Kubernetes statistics reported by Heapster.
To access your nodes’ CPU and memory capacity, kube-state-metrics (presented in Part 3) exposes these two metrics:
kube-state-metrics also reports
kube_node_status_allocatable_memory_bytes tracking respectively the CPU and memory resources of each node that are available for scheduling. Note that these metrics don’t track actual reservation and are not impacted by current scheduling operations. They are equal to the remaining resource available in the node capacity once you remove the amount of resource dedicated to system processes (journald, sshd, kubelet, kube-proxy, etc…).
Requests vs. limits
For pod scheduling, Kubernetes allows you to specify how much CPU and memory each container can consume through two types of thresholds:
- Request represents the minimum amount of CPU or memory the container needs to run, which needs to be guaranteed by the system.
- Limit is the maximum amount of the resource that the container will be allowed to consume. It’s unbounded by default.
Beware of the trap
With other technologies, you are probably used to monitoring actual resource consumption and comparing that with your node capacity. With Kubernetes, if the sum of container limits on a node is strictly greater than the sum of requests (minimum resources required), the node can be oversubscribed and containers might use more resources than they actually need, which is fine. Even if they use 100 percent of the available CPU resources on a node, for example, Kubernetes can still make room to schedule another pod on the node. Kubernetes would simply lower the CPU available to existing pods to free up resources for the new one, as long as all containers have enough resources to meet their request. That’s why monitoring the sum of requests on the node and making sure it never exceeds your node’s capacity is much more important than monitoring simple CPU or memory usage. If you don’t have enough capacity to meet the minimum resource requirements of all your containers, you should scale up your nodes’ capacity or add more nodes to distribute the workload.
Having some oversubscription on your nodes can be good in many cases since it can help reduce the number of nodes in your Kubernetes cluster. You can tune the request/limit ratio by monitoring it over time and tracking how it impacts your container resource usage.
Note that since version 1.3 Kubernetes offers auto-scaling capabilities for Google Compute Engine and Google Container Engine (AWS support should come soon). So on those platforms Kubernetes can now adjust the number pods in a deployment, replica set, or replication controller based on CPU utilization (support for other auto-scaling triggers is in alpha).
Container resource metrics
As explained in the section about container metrics, some statistics reported by Docker should be also monitored as they provide deeper (and more accurate) insights. The CPU throttling metric is a great example, as it represents the number of times a container hit its specified limit.
Disk usage and I/O
The percentage of disk in use is generally more useful than the volume of disk usage, since the thresholds of concern won’t depend on the size of your clusters. You should graph its evolution over time and trigger an alert if it exceeds 80% for example.
Graphing the number of bytes read from or written to disk provides critical context for higher-level metrics. For example, you can quickly check whether a latency spike is due to increased I/O activity.
Just as with ordinary hosts, you should monitor network metrics from your pods and containers.
|Network in||Bytes per second received through network||Resource: Utilization|
|Network out||Bytes per second sent through network||Resource: Utilization|
|Network errors||Number of network errors per second||Resource: Error|
Network metrics can shed light on traffic load. You should investigate if you see an increasing number of network errors per second, which could indicate a low-level issue or a networking misconfiguration.
Container health checks
In addition to standard resource metrics, Kubernetes also provides configurable health checks. You can configure, via the PodSpec, checks to detect:
- When running applications enter a broken state (liveness probe fails), in which case the kubelet will kill the container.
- When applications are temporarily unable to properly address requests (readiness probe fails), in which case the Kubernetes endpoint controller will remove the pod’s IP address from the endpoints of all services that match the pod, so that no traffic is sent to the affected containers.
The kubelet can run diagnostic liveness and readiness probes against containers through an HTTP check (the most common choice), an exec check, or a TCP check. The Kubernetes documentation provides more details about container probes and tips on when you should use them.
Monitoring containers using native metrics
As we said, container metrics should be usually preferred to Kubernetes metrics. Containers can rightly be seen as mini-hosts. Just like virtual machines, they run on behalf of resident software, which consumes CPU, memory, I/O, and network resources.
If you are using Docker, check out our Docker monitoring guide, which discusses all the resource metrics available from Docker that you should collect and monitor.
Using Docker in the framework provided by Kubernetes labels will give you insights about your containers’ health and performance. Kubernetes labels are already applied to Docker metrics. You could track for example the number of running containers by pod, or the most RAM-intensive pods by graphing the RSS non-cache memory broken down by pod name.
In order to properly monitor your containerized infrastructure, you should collect Kubernetes data along with Docker container resource metrics, and correlate them with the health and performance of the different applications running on top of them. Each image comes with its specificities, and the types of metrics you should track and alert on will vary from one to another. However throughput, latency, and errors are usually the most important metrics.
Heapster is not designed to collect by default metrics from the applications running in your containers. If you want deeper context than just system metrics, you have to instrument your applications in order to collect metrics from them as well.
Since Kubernetes 1.2 a new feature (still in Alpha) allows cAdvisor to collect custom metrics from applications running in containers, if these metrics are exposed in the Prometheus format natively, which is the case for only a few applications today. These custom metrics can be used to trigger horizontal pod auto-scaling (HPA) when a metric exceeds a specified threshold. Note that Heapster re-exposes these custom metrics through its Model API which is not an official Kubernetes API.
Correlate with events
Collecting events from Docker and Kubernetes allows you to see how pod creation, destruction, starting, or stopping impacts the performance of your infrastructure (and also the inverse).
While Docker events trace container lifecycles, Kubernetes events report on pod lifecycles and deployments. Tracking pods failures for example can indicate a misconfiguration or resource saturation. That’s why you should correlate events with resource metrics for easier investigations.
Pod scheduling events
You can make sure pod scheduling works properly by tracking Kubernetes events. If scheduling fails repeatedly, you should investigate. Insufficient resources in your cluster such as CPU or memory can be the root cause of scheduling issues, in which case you should consider adding more nodes to the cluster, or deleting unused pods to make room for pending ones.
Node ports can also be a cause of scheduling contention. If NodePort is used to assign specific port numbers, then Kubernetes won’t be able to schedule a pod to a node where that port is already taken. This can lead to scheduling issues due to:
- Poor configuration, for example if two conflicting pods try to claim the same port.
- Resource saturation, for example if the NodePort is set but the replica set requires more pod replicas than there are nodes. In that case you should scale up the number of nodes or use a Kubernetes service so multiple pods behind it can live in one node.
Since your pods are constantly moving, alerts on the metrics they report (CPU, memory, I/O, network…) have to follow. That’s why they should be set up using what remains stable as pods come and go: custom labels, service names, and names of replication controllers or replica sets.
A concrete use case
As discussed in Part 1, monitoring orchestrated, containerized infrastructure means collecting metrics from every layer of your stack: from Docker and Kubernetes as well as from your hosts and containerized applications. Let’s see how the different data from all the components of your infrastructure can be used to investigate a performance issue.
Let’s say we are running NGINX for our web app in Docker containers, which are orchestrated by Kubernetes.
1. Application metric showing performance issue
We receive an alert triggered after the number of NGINX 5xx errors suddenly skyrocketed over a set threshold.
2. Corresponding Kubernetes labels and events
If we look at which pods our web app was running on, we can see that the Kubernetes label attached to them, which defines the replication controller involved, is rc-nginx. And when looking at Kubernetes events, a rolling update deployment happened on those pods exactly at the moment that the web app started returning 5xx errors.
Let’s investigate the containers impacted by this rolling update to understand what happened.
3. What happened at the container level
The first place to look is usually resource metrics. Remember that Docker metrics should be preferred to Kubernetes for time-sampled data. So let’s graph the CPU utilization by Docker containers, broken down by pod (or container) and filtered to retain only the pods with the label rc-nginx.
Interesting! It looks like CPU usage in some pods drastically increased at the moment that the 5xx error peaked. Would it be possible that the underlying hosts running this pod replica saturated their CPU capacity?
4. Host metrics to confirm the hypothesis
By graphing the CPU usage broken down by host, we can see that indeed three hosts maxed out their CPU at that moment.
Resolving the issue and postmortem
A short-term solution can be to roll back the update to our web app code if we think that an update led to this issue. Scaling up our hosts’ CPU capacity can also help support higher resource consumption.
If appropriate, we could also make use of the underlying mechanism in Kubernetes that imposes restrictions on the resources (CPU and memory) a single pod can consume. In this case, we should consider lowering the CPU limit for a given pod.
Here we have combined data from across our container infrastructure to find the root cause of a performance issue:
- Application metrics for alerting
- Kubernetes labels to identify affected pods
- Kubernetes events to look for potential causes
- Docker metrics aggregated by Kubernetes labels to investigate hypothesized cause
- Host-level metrics to confirm resource constraint
Watching the conductor and the orchestra
Kubernetes makes working with containers much easier. However it requires you to completely rethink how you monitor your infrastructure and applications. For example, having a smart labeling strategy is now essential, as is smartly combining data from Kubernetes, your container technology, and your applications for full observability.
The methods and tools used to collect resource metrics from Kubernetes are different from the commands used on a traditional host. Part 3 of this series covers how to collect the performance metrics you need to properly monitor your containerized apps and infrastructure, as well as their orchestration by Kubernetes. Read on…
Many thanks to Lachlan Evenson from Deis, Charles Butler from Canonical, Mike Kaplinsky from Ladder, Rudi Chiarito from Clarifai, and the Kubernetes Slack communities for reviewing this publication and suggesting improvements.