The Service Map for APM is here!
Autoscale your Kubernetes workloads with any Datadog metric

Autoscale your Kubernetes workloads with any Datadog metric

/ / /
Published: October 18, 2018

With the release of the Datadog Cluster Agent, which was detailed in a companion post, we’re pleased to announce that you can now use any metric collected by Datadog to autoscale your applications running in Kubernetes.

Horizontal Pod Autoscaling in Kubernetes

The Horizontal Pod Autoscaling (HPA) feature, which was introduced in Kubernetes v1.2, allows users to autoscale their applications off of basic metrics like CPU, accessed from a resource called metrics-server. With Kubernetes v1.6, it became possible to autoscale off of user-defined custom metrics collected from within the cluster. Support for external metrics was introduced in Kubernetes v1.10, which allows users to autoscale off of any metric from outside the cluster—which now includes any metric you’re monitoring with Datadog.

This post demonstrates how autoscaling with Datadog metrics works, by walking through an example of how you can scale a Kubernetes workload based on metrics reported by NGINX.

Prerequisites

Before getting started, ensure that your Kubernetes cluster is running v1.10+ (in order to be able to register the External Metrics Provider resource against the Kubernetes API server). You will also need to enable the aggregation layer; refer to the Kubernetes documentation to learn how.

If you’d like to follow along, make sure that:

  • You have a Datadog account (if not, here’s a )
  • You have node-based Datadog Agents running (ideally from a DaemonSet) with Autodiscovery enabled and running
  • Your node-based Agents are configured to securely communicate with the Cluster Agent (see the documentation for details)

The third point is not mandatory, but it enables Datadog to enrich Kubernetes metrics with the metadata collected by the node-based Agents. You can find the manifests used in this walkthrough on GitHub.

Spin up the Datadog Cluster Agent

The Datadog Cluster Agent serves as a proxy between the API Server and each node-based Agent. Therefore, you’ll need to create the appropriate role-based access control (RBAC) rules to give the Cluster Agent access to some cluster-level resources:

kubectl apply -f manifests/cluster-agent/rbac/rbac-cluster-agent.yaml

You should see output similar to the following:

clusterrole.rbac.authorization.k8s.io "dca" created
clusterrolebinding.rbac.authorization.k8s.io "dca" created
serviceaccount "dca" created

Now we are ready to create the Datadog Cluster Agent and its services. In the Cluster Agent’s deployment manifest (cluster-agent.yaml), add your Datadog <API_KEY> and <APP_KEY> (accessible here in your account), and set the DD_EXTERNAL_METRICS_PROVIDER_ENABLED variable to true. Finally, spin up the resources:

# deploy the Datadog Cluster Agent
kubectl apply -f manifests/cluster-agent/cluster-agent.yaml

# deploy the necessary services
kubectl apply -f manifests/cluster-agent/datadog-cluster-agent_service.yaml
kubectl apply -f manifests/cluster-agent/hpa-example/cluster-agent-hpa-svc.yaml

Note that the first service is used for the communication between the node-based Agents and the Cluster Agent, while the second is used by Kubernetes to register the External Metrics Provider.

At this point you should see that the Cluster Agent is running, along with these two services:

kubectl get pods, svc -l app=datadog-cluster-agent

PODS:

NAMESPACE     NAME                                     READY     STATUS    RESTARTS   AGE
default       datadog-cluster-agent-7b7f6d5547-cmdtc   1/1       Running   0          28m

SVCS:

NAMESPACE     NAME                            TYPE        CLUSTER-IP        EXTERNAL-IP   PORT(S)         AGE
default       datadog-custom-metrics-server   ClusterIP   192.168.254.87    <none>        443/TCP         28m
default       datadog-cluster-agent           ClusterIP   192.168.254.197   <none>        5005/TCP        28m

Register the External Metrics Provider

Once the Datadog Cluster Agent is up and running, register it as an External Metrics Provider, via the service exposing the port 443, by applying the following RBAC rules:

kubectl apply -f manifests/cluster-agent/hpa-example/rbac-hpa.yaml

You should see something similar to the following output:

clusterrolebinding.rbac.authorization.k8s.io "system:auth-delegator" created
rolebinding.rbac.authorization.k8s.io "dca" created
apiservice.apiregistration.k8s.io "v1beta1.external.metrics.k8s.io" created
clusterrole.rbac.authorization.k8s.io "external-metrics-reader" created
clusterrolebinding.rbac.authorization.k8s.io "external-metrics-reader" created

Once the Datadog Cluster Agent is running and the service has been registered, you should see the following when you list the running pods and services:

kubectl get pods,svc

PODS

NAMESPACE     NAME                                     READY     STATUS    RESTARTS   AGE
default       datadog-agent-4c5pp                      1/1       Running   0          14m
default       datadog-agent-ww2da                      1/1       Running   0          14m
default       datadog-agent-2qqd3                      1/1       Running   0          14m
[...]
default       datadog-cluster-agent-7b7f6d5547-cmdtc   1/1       Running   0          16m

Create the Horizontal Pod Autoscaler

Now it’s time to create a Horizontal Pod Autoscaler manifest that lets the Datadog Cluster Agent pull metrics from Datadog. If you take a look at the provided hpa-manifest.yaml example file, you should see:

  • The HPA is configured to autoscale the nginx deployment
  • The maximum number of replicas created is 3 and the minimum is 1
  • The HPA will autoscale off of the metric nginx.net.request_per_s, over the scope kube_container_name: nginx. Note that this format corresponds to the name of the metric in Datadog

Every 30 seconds, Kubernetes queries the Datadog Cluster Agent for the value of the NGINX request-per-second metric and autoscales the nginx deployment if necessary. For advanced use cases, it is possible to autoscale based on several metrics—in that case, the autoscaler will choose the metric that creates the largest number of replicas. You can also configure the frequency at which Kubernetes checks the value of the external metrics.

Create an autoscaling deployment

Now, let’s create the NGINX deployment that Kubernetes will autoscale for us:

kubectl apply -f manifests/cluster-agent/hpa-example/nginx.yaml

Then, apply the HPA manifest:

kubectl apply -f manifests/cluster-agent/hpa-example/hpa-manifest.yaml

You should see your NGINX pod running, along with the corresponding service:

kubectl get pods,svc

POD:

default       nginx-6757dd8769-5xzp2                   1/1       Running   0          3m

SVC:

NAMESPACE     NAME                  TYPE        CLUSTER-IP        EXTERNAL-IP   PORT(S)         AGE
default       nginx                 ClusterIP   192.168.251.36    none          8090/TCP        3m


HPAS:

NAMESPACE   NAME       REFERENCE          TARGETS       MINPODS   MAXPODS   REPLICAS   AGE
default     nginxext   Deployment/nginx   0/9 (avg)       1         3         1        3m

Make a note of the CLUSTER-IP of your NGINX service; you’ll need it in the next step.

Stress your service to see autoscaling in action

At this point, we’re ready to stress the setup and see how Kubernetes autoscales the NGINX pods based on external metrics from the Datadog Cluster Agent.

Send a cURL request to the IP of the NGINX service (replacing NGINX_SVC with the CLUSTER-IP from the previous step):

curl <NGINX_SVC>:8090/nginx_status

You should receive a simple response, reporting some statistics about the NGINX server:

Active connections: 1 
server accepts handled requests
 1 1 1 
Reading: 0 Writing: 1 Waiting: 0 

Behind the scenes, the number of NGINX requests per second also increased. Thanks to Autodiscovery, the node-based Agent already detected NGINX running in a pod, and used the pod’s annotations to configure the Agent check to start collecting NGINX metrics.

Now that you’ve stressed the pod, you should see the uptick in the rate of NGINX requests per second in your Datadog account. Because you referenced this metric in your HPA manifest (hpa-manifest.yaml), and registered the Datadog Cluster Agent as an External Metrics Provider, Kubernetes will regularly query the Cluster Agent to get the value of the nginx.net.request_per_s metric. If it notices that the average value has exceeded the targetAverageValue threshold in your HPA manifest, it will autoscale your NGINX pods accordingly. Let’s see it in action!

Run the following command:

while true; do curl <NGINX_SVC>:8090/nginx_status; sleep 0.1; done

In your Datadog account, you should soon see the number of NGINX requests per second spiking, and eventually rising above 9, the threshold listed in your HPA manifest. When Kubernetes detects that this metric has exceeded the threshold, it should begin autoscaling your NGINX pods. And indeed, you should be able to see new NGINX pods being created:

kubectl get pods,svc

PODS:

NAMESPACE     NAME                                     READY     STATUS    RESTARTS   AGE
default       datadog-cluster-agent-7b7f6d5547-cmdtc   1/1       Running   0          9m
default       nginx-6757dd8769-5xzp2                   1/1       Running   0          2m
default       nginx-6757dd8769-k6h6x                   1/1       Running   0          2m
default       nginx-6757dd8769-vzd5b                   1/1       Running   0          29m

HPAS:

NAMESPACE   NAME       REFERENCE          TARGETS       MINPODS   MAXPODS   REPLICAS   AGE
default     nginxext   Deployment/nginx   30/9 (avg)     1         3         3         29m

Voilà. You can use Datadog dashboards and alerts to track Kubernetes autoscaling activity in real time, and to ensure that you’ve configured thresholds that appropriately reflect your workloads. Below, you can see that after the average rate of NGINX requests per second increased above the autoscaling threshold, Kubernetes scaled the number of pods to match the desired number of replicas from our HPA manifest (maxReplicas: 3).

autoscaling kubernetes with datadog cluster agent, horizontal pod autoscaling, and external metrics provider

Autoscaling Kubernetes with Datadog

We’ve shown you how the Datadog Cluster Agent can help you easily autoscale your Kubernetes applications in response to real-time workloads. The possibilities are endless—not only can you scale based on metrics from anywhere in your cluster, but you can also use metrics from your cloud services (such as AWS RDS or ELB) to autoscale databases, caches, or load balancers.

If you’re already monitoring Kubernetes with Datadog, you can immediately deploy the Cluster Agent (by following the instructions here) to autoscale your applications based on any metric available in your Datadog account. If you’re new to Datadog, get started with a .