Autoscale Kubernetes workloads with any Datadog metric
With the release of the Datadog Cluster Agent, which was detailed in a companion post, we’re pleased to announce that you can now use any metric collected by Datadog to autoscale your applications running in Kubernetes.
Horizontal Pod Autoscaling in Kubernetes
The Horizontal Pod Autoscaling (HPA) feature, which was introduced in Kubernetes v1.2, allows users to autoscale their applications off of basic metrics like CPU, accessed from a resource called metrics-server. With Kubernetes v1.6, it became possible to autoscale off of user-defined custom metrics collected from within the cluster. Support for external metrics was introduced in Kubernetes v1.10, which allows users to autoscale off of any metric from outside the cluster—which now includes any metric you’re monitoring with Datadog.
This post demonstrates how to autoscale Kubernetes with Datadog metrics, by walking through an example of how you can scale a workload based on metrics reported by NGINX.
Before getting started, ensure that your Kubernetes cluster is running v1.10+ (in order to be able to register the External Metrics Provider resource against the Kubernetes API server). You will also need to enable the aggregation layer; refer to the Kubernetes documentation to learn how.
If you’d like to follow along, make sure that:
- You have a Datadog account (if not, here’s a free trial)
- You have node-based Datadog Agents running (ideally from a DaemonSet) with Autodiscovery enabled and running
- Your node-based Agents are configured to securely communicate with the Cluster Agent (see the documentation for details)
The third point is not mandatory, but it enables Datadog to enrich Kubernetes metrics with the metadata collected by the node-based Agents. You can find the manifests used in this walkthrough on GitHub.
Spin up the Datadog Cluster Agent
The Datadog Cluster Agent serves as a proxy between the API Server and each node-based Agent. Therefore, you’ll need to create the appropriate role-based access control (RBAC) rules to give the Cluster Agent access to some cluster-level resources:
kubectl apply -f manifests/cluster-agent/rbac/rbac-cluster-agent.yaml
You should see output similar to the following:
clusterrole.rbac.authorization.k8s.io "dca" created clusterrolebinding.rbac.authorization.k8s.io "dca" created serviceaccount "dca" created
Now we are ready to create the Datadog Cluster Agent and its services. In the Cluster Agent’s deployment manifest (cluster-agent.yaml), add your Datadog
<APP_KEY> (accessible here in your account), and set the
DD_EXTERNAL_METRICS_PROVIDER_ENABLED variable to
true. Finally, spin up the resources:
# deploy the Datadog Cluster Agent kubectl apply -f manifests/cluster-agent/cluster-agent.yaml # deploy the necessary services kubectl apply -f manifests/cluster-agent/datadog-cluster-agent_service.yaml kubectl apply -f manifests/cluster-agent/hpa-example/cluster-agent-hpa-svc.yaml
Note that the first service is used for the communication between the node-based Agents and the Cluster Agent, while the second is used by Kubernetes to register the External Metrics Provider.
At this point you should see that the Cluster Agent is running, along with these two services:
kubectl get pods, svc -l app=datadog-cluster-agent PODS: NAMESPACE NAME READY STATUS RESTARTS AGE default datadog-cluster-agent-7b7f6d5547-cmdtc 1/1 Running 0 28m SVCS: NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE default datadog-custom-metrics-server ClusterIP 192.168.254.87 <none> 443/TCP 28m default datadog-cluster-agent ClusterIP 192.168.254.197 <none> 5005/TCP 28m
Register the External Metrics Provider
Once the Datadog Cluster Agent is up and running, register it as an External Metrics Provider, via the service exposing the port 443, by applying the following RBAC rules:
kubectl apply -f manifests/cluster-agent/hpa-example/rbac-hpa.yaml
You should see something similar to the following output:
clusterrolebinding.rbac.authorization.k8s.io "system:auth-delegator" created rolebinding.rbac.authorization.k8s.io "dca" created apiservice.apiregistration.k8s.io "v1beta1.external.metrics.k8s.io" created clusterrole.rbac.authorization.k8s.io "external-metrics-reader" created clusterrolebinding.rbac.authorization.k8s.io "external-metrics-reader" created
Once the Datadog Cluster Agent is running and the service has been registered, you should see the following when you list the running pods and services:
kubectl get pods,svc PODS NAMESPACE NAME READY STATUS RESTARTS AGE default datadog-agent-4c5pp 1/1 Running 0 14m default datadog-agent-ww2da 1/1 Running 0 14m default datadog-agent-2qqd3 1/1 Running 0 14m [...] default datadog-cluster-agent-7b7f6d5547-cmdtc 1/1 Running 0 16m
Create the Horizontal Pod Autoscaler
Now it’s time to create a Horizontal Pod Autoscaler manifest that lets the Datadog Cluster Agent pull metrics from Datadog. If you take a look at the provided hpa-manifest.yaml example file, you should see:
- The HPA is configured to autoscale the
- The maximum number of replicas created is 3 and the minimum is 1
- The HPA will autoscale off of the metric
nginx.net.request_per_s, over the scope
kube_container_name: nginx. Note that this format corresponds to the name of the metric in Datadog
Every 30 seconds, Kubernetes queries the Datadog Cluster Agent for the value of the NGINX request-per-second metric and autoscales the
nginx deployment if necessary. For advanced use cases, it is possible to autoscale Kubernetes based on several metrics—in that case, the autoscaler will choose the metric that creates the largest number of replicas. You can also configure the frequency at which Kubernetes checks the value of the external metrics.
Create an autoscaling Kubernetes deployment
Now, let’s create the NGINX deployment that Kubernetes will autoscale for us:
kubectl apply -f manifests/cluster-agent/hpa-example/nginx.yaml
Then, apply the HPA manifest:
kubectl apply -f manifests/cluster-agent/hpa-example/hpa-manifest.yaml
You should see your NGINX pod running, along with the corresponding service:
kubectl get pods,svc POD: default nginx-6757dd8769-5xzp2 1/1 Running 0 3m SVC: NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE default nginx ClusterIP 192.168.251.36 none 8090/TCP 3m HPAS: NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE default nginxext Deployment/nginx 0/9 (avg) 1 3 1 3m
Make a note of the
CLUSTER-IP of your NGINX service; you’ll need it in the next step.
Stress your service to see Kubernetes autoscaling in action
At this point, we’re ready to stress the setup and see how Kubernetes autoscales the NGINX pods based on external metrics from the Datadog Cluster Agent.
Send a cURL request to the IP of the NGINX service (replacing
NGINX_SVC with the
CLUSTER-IP from the previous step):
You should receive a simple response, reporting some statistics about the NGINX server:
Active connections: 1 server accepts handled requests 1 1 1 Reading: 0 Writing: 1 Waiting: 0
Behind the scenes, the number of NGINX requests per second also increased. Thanks to Autodiscovery, the node-based Agent already detected NGINX running in a pod, and used the pod’s annotations to configure the Agent check to start collecting NGINX metrics.
Now that you’ve stressed the pod, you should see the uptick in the rate of NGINX requests per second in your Datadog account. Because you referenced this metric in your HPA manifest (hpa-manifest.yaml), and registered the Datadog Cluster Agent as an External Metrics Provider, Kubernetes will regularly query the Cluster Agent to get the value of the
nginx.net.request_per_s metric. If it notices that the average value has exceeded the
targetAverageValue threshold in your HPA manifest, it will autoscale your NGINX pods accordingly.
Let’s see it in action!
Run the following command:
while true; do curl <NGINX_SVC>:8090/nginx_status; sleep 0.1; done
In your Datadog account, you should soon see the number of NGINX requests per second spiking, and eventually rising above 9, the threshold listed in your HPA manifest. When Kubernetes detects that this metric has exceeded the threshold, it should begin autoscaling your NGINX pods. And indeed, you should be able to see new NGINX pods being created:
kubectl get pods,svc PODS: NAMESPACE NAME READY STATUS RESTARTS AGE default datadog-cluster-agent-7b7f6d5547-cmdtc 1/1 Running 0 9m default nginx-6757dd8769-5xzp2 1/1 Running 0 2m default nginx-6757dd8769-k6h6x 1/1 Running 0 2m default nginx-6757dd8769-vzd5b 1/1 Running 0 29m HPAS: NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE default nginxext Deployment/nginx 30/9 (avg) 1 3 3 29m
Voilà. You can use Datadog dashboards and alerts to track Kubernetes autoscaling activity in real time, and to ensure that you’ve configured thresholds that appropriately reflect your workloads. Below, you can see that after the average rate of NGINX requests per second increased above the autoscaling threshold, Kubernetes scaled the number of pods to match the desired number of replicas from our HPA manifest (
Autoscaling Kubernetes with Datadog
We’ve shown you how the Datadog Cluster Agent can help you easily autoscale Kubernetes applications in response to real-time workloads. The possibilities are endless—not only can you scale based on metrics from anywhere in your cluster, but you can also use metrics from your cloud services (such as AWS RDS or ELB) to autoscale databases, caches, or load balancers.
If you’re already monitoring Kubernetes with Datadog, you can immediately deploy the Cluster Agent (by following the instructions here) to autoscale your applications based on any metric available in your Datadog account. If you’re new to Datadog, get started with a 14-day free trial.