How to monitor Kubernetes + Docker with Datadog

How to monitor Kubernetes + Docker with Datadog

/ / /
Published: September 11, 2017

Since Kubernetes was open sourced by Google in 2014, it has steadily grown in popularity to become nearly synonymous with Docker orchestration. Kubernetes is being widely adopted by forward-thinking organizations such as Box and GitHub for a number of reasons: its active community, rapid development, and of course its ability to schedule, automate, and manage distributed applications on dynamic container infrastructure.

Kubernetes + Datadog

In this guide, we’ll walk through setting up monitoring for a containerized application that is orchestrated by Kubernetes. We’ll use the guestbook-go example application from the Kubernetes project. Using this one example, we’ll step through several different layers of monitoring:

Collect Kubernetes and Docker metrics

The containerized Datadog Agent can be deployed as a Kubernetes daemonset, a construct that ensures that a particular pod (the fundamental building block of Kubernetes applications) runs on every node in the cluster, or on some subset of nodes. This deployment method ensures that every worker node in your Kubernetes cluster will run the Agent, and that key resource metrics and events are collected from both Kubernetes and Docker for monitoring in Datadog.

To deploy the Agent, copy this manifest from your Datadog account, save it to a Kubernetes master node as dd-agent.yaml, and run:

$ kubectl create -f dd-agent.yaml

Now you can verify that the Agent is running and is collecting Docker and Kubernetes metrics by running the Agent’s info command. To do that, you first need to get the list of running pods so you can run the command on one of the Datadog Agent pods:

# Get the list of running pods
$ kubectl get pods
NAME             READY     STATUS    RESTARTS   AGE
dd-agent-krrmd   1/1       Running   0          17d

# Use the pod name returned above to run the Agent's 'info' commmand
$ kubectl exec dd-agent-krrmd service datadog-agent info

In the output you should see sections resembling the following, indicating that Kubernetes and Docker metrics are being collected:

    kubernetes (5.14.1)
    -------------------
      - instance #0 [OK]
      - Collected 414 metrics, 0 events & 3 service checks
...
    docker_daemon (5.14.1)
    ----------------------
      - instance #0 [OK]
      - Collected 108 metrics, 0 events & 1 service check

Now you can glance at your built-in Datadog dashboards for Kubernetes and Docker to see what those metrics look like.

Template Kubernetes monitoring dashboard in Datadog
Datadog's out-of-the-box dashboard for Kubernetes monitoring.

Add more Kubernetes metrics with kube-state-metrics

By default, the Kubernetes Agent check reports a handful of basic system metrics to Datadog, covering CPU, network, disk, and memory usage. You can easily expand on the data collected from Kubernetes by deploying the kube-state-metrics add-on to your cluster, which provides much more detailed metrics on the state of the cluster itself.

kube-state-metrics listens to the Kubernetes API and generates metrics about the state of Kubernetes logical objects: node status, node capacity (CPU and memory), number of desired/available/unavailable/updated replicas per deployment, pod status (e.g. waiting, running, ready), and so on. You can see the full list of metrics that Datadog collects from kube-state-metrics here.

To deploy kube-state-metrics as a Kubernetes service, copy the manifest here, paste it into a kube-state-metrics.yaml file, and deploy the service to your cluster:

kubectl create -f kube-state-metrics.yaml

Within minutes, you should see metrics with the prefix kubernetes_state. streaming into your Datadog account.

Collect metrics using Autodiscovery

The Datadog Agent can automatically track which services are running where, thanks to its Autodiscovery feature. Autodiscovery lets you define configuration templates for Agent checks and specify which containers each check should apply to. The Agent enables, disables, and regenerates static check configurations from the templates as containers come and go.

Out of the box, the Agent can use Autodiscovery to connect to a number of common containerized services, such as Redis and Apache (httpd), which have standardized configuration patterns. In this section, we’ll show how Autodiscovery allows the Datadog Agent to connect to the Redis master containers in our guestbook application stack, without any manual configuration.

The guestbook app

Deploying the guestbook application according to the step-by-step Kubernetes documentation is a great way to learn about the various pieces of a Kubernetes application. For this guide, we have modified the Go code for the guestbook app to add instrumentation (which we’ll cover below) and condensed the various deployment manifests for the app into one, so you can deploy a fully functional guestbook app with one kubectl command.

Moving pieces

Simple guestbook app interface

The guestbook app is a simple web application that allows you to enter names (or other strings) into a field in a web page. The app then stores those names in the Redis-backed “guestbook” and displays them on the page.

There are three main components to the guestbook application: the Redis master pod, two Redis slave pods, and three “guestbook” pods running the Go web service. Each component has its own Kubernetes service for routing traffic to replicated pods. The guestbook service is of the type LoadBalancer, making the web application accessible via a public IP address.

Deploying the guestbook app

Copy the contents of the manifest to your Kubernetes master as guestbook-deployment-full.yaml and run:

$ kubectl apply -f guestbook-deployment-full.yaml

Verify that Redis metrics are being collected

When you deployed the Datadog Agent earlier in this guide, you already enabled Autodiscovery (by providing docker as the value of the SD_BACKEND). So the Datadog Agent should already be pulling metrics from Redis containers in the backend of your guestbook app, regardless of which nodes those containers are running on. But for reasons we’ll explain shortly, only the Redis master will be monitored by default.

To verify that Autodiscovery worked as expected, run the info command again:

# Get the list of running pods
$ kubectl get pods
NAME             READY     STATUS    RESTARTS   AGE
dd-agent-krrmd   1/1       Running   0          17d

# Use the pod name returned above to run the Agent's 'info' commmand
$ kubectl exec dd-agent-krrmd service datadog-agent info

Look for a redis section in the output, like this:

    redisdb (5.14.1)
    ----------------
      - instance #0 [OK]
      - Collected 35 metrics, 0 events & 2 service checks
      - Dependencies:
          - redis: 2.10.5

Note that the guestbook application only has a single Redis master instance, so if you’re running this exercise on a multi-node cluster you may need to run the info command on each dd-agent pod to find the particular Agent that’s monitoring the Redis instance.

Now you can open up your out-of-the-box Redis dashboard in Datadog, which will immediately begin populating with metrics from your Redis master.

Template Redis monitoring dashboard in Datadog
The out-of-the-box Datadog dashboard for Redis cache, populating with metrics from a Kubernetes cluster.

Add custom config templates for Autodiscovery

Add custom monitoring configs with pod annotations

From the output in the previous step, we can see that Datadog is monitoring our Redis master node, but not our slave nodes. That’s because Autodiscovery uses container names to match monitoring configuration templates to individual containers. In the case of Redis, the Agent looks to apply its standard Redis check to all Docker images named redis. And while the Redis master is indeed built from the redis image, the slaves are using a different image, named kubernetes/redis-slave.

Using Kubernetes pod annotations, we can attach configuration parameters to any container, so that the Datadog Agent can connect to those containers and collect monitoring data. In this case, we want to apply the standard Redis configuration template to our containers that are based on the image kubernetes/redis-slave.

In the guestbook manifest, we need to add a simple set of pod annotations that does three things:

  1. Tells the Datadog Agent to apply the Redis Agent check (redisdb) to containers running the redis-slave image
  2. Supplies an empty set of init_configs for the check (this is the default for Datadog’s Redis Agent check)
  3. Supplies instances configuration for the Redis check, using template variables instead of a static host and port

To enable monitoring of all the Redis containers running in the guestbook app, add the annotations below to the guestbook-deployment-full.yaml manifest, in the spec for the redis-slave ReplicationController:

spec:
  replicas: 2
  selector:
    app: redis
    role: slave
  template:
    metadata:
      labels:
        app: redis
        role: slave
      annotations:
        service-discovery.datadoghq.com/redis-slave.check_names: '["redisdb"]'
        service-discovery.datadoghq.com/redis-slave.init_configs: '[{}]'
        service-discovery.datadoghq.com/redis-slave.instances: '[{"host": "%%host%%", "port": "%%port%%"}]'

To unpack those annotations a bit: service-discovery.datadoghq.com is the string that the Agent looks for to identify configuration parameters for an Agent check, and redis-slave is the name of the containers to which the Agent will apply the check.

Now apply the change:

$ kubectl apply -f guestbook-deployment-full.yaml

Verify that all Redis containers are being monitored

The configuration change should allow Datadog to pick up metrics from Redis containers, whether they run the redis image or kubernetes/redis-slave. To verify that you’re collecting metrics from all your containers, you can view your Redis metrics in Datadog, broken down by image_name.

Redis metrics by image name, graphed in Datadog
Redis metrics in Datadog, broken down by image name.

Monitor a container listening on multiple ports

The example above works well for relatively simple use cases where the Agent can connect to a container that’s been assigned a single port, but what if the connection parameters are a bit more complex? In this section we’ll show how you can use indexes on template variables to help the Datadog Agent choose one port from many.

Using an expvar interface to expose metrics

Since our guestbook app is written in Go, we can use the expvar library to collect extensive memory-usage metrics from our app, almost for free. In our application’s main.go, we import the expvar package using an underscore to indicate that we only need the “side effects” of the package—exposing basic memory stats over HTTP:

import _ "expvar"

In the guestbook app’s Kubernetes manifest, we have assigned port 2999 to the expvar server:

        ports:
        - name: expvar-server
          containerPort: 2999
          protocol: TCP

Direct the Agent to use the correct port

The Datadog Agent has an expvar integration, so all you need to do is provide a Kubernetes pod annotation to properly configure the Agent to gather expvar metrics. To do that, you can use the expvar monitoring template for the Agent check and convert the essential components from YAML to JSON to construct your pod annotations. Once again, our annotations will cause the Datadog Agent to:

  1. Apply the go_expvar Agent check to the guestbook containers
  2. Supply an empty set of init_configs for the check (this is the default for Datadog’s expvar Agent check)
  3. Dynamically generate the correct URL for the expvar interface using template variables for the host and port

In this case, however, our app is using two ports (port 2999 for expvar and 3000 for the main HTTP service), so the challenge is making sure that Datadog selects the correct port to look for expvar metrics. To do that, we’ll make use of template variable indexing, which enables you to direct Autodiscovery to select the correct host or port from a list of available options. When the Agent inspects a container, Autodiscovery sorts the IP addresses and ports in ascending order, allowing you to address them by index. In this case we want to select the first (smaller) value of the two ports exposed on the container, so we use %%port_0%%. Add the annotation lines below to the guestbook Deployment in the guestbook-deployment-full.yaml to set up the expvar Agent check:

spec:
  replicas: 3
  template:
    metadata:
      labels:
        app: guestbook
      annotations:
        service-discovery.datadoghq.com/guestbook.check_names: '["go_expvar"]'
        service-discovery.datadoghq.com/guestbook.init_configs: '[{}]'
        service-discovery.datadoghq.com/guestbook.instances: '[{"expvar_url": "http://%%host%%:%%port_0%%"}]'

Now save the file and apply the change:

$ kubectl apply -f guestbook-deployment-full.yaml

Confirm that expvar metrics are being collected

Run the Agent’s info command to ensure that expvar metrics are being picked up by the Agent:

# Get the list of running pods
$ kubectl get pods
NAME             READY     STATUS    RESTARTS   AGE
dd-agent-krrmd   1/1       Running   0          17d

# Use the pod name returned above to run the Agent's 'info' commmand
$ kubectl exec dd-agent-krrmd service datadog-agent info

In the output, look for an expvar section in the list of checks:

    go_expvar (5.16.0)
    ------------------
      - instance #0 [OK]
      - Collected 13 metrics, 0 events & 0 service checks

Send custom metrics to DogStatsD

All of the above steps involve using out-of-the-box monitoring functionality or modifying Datadog’s natively supported integrations to meet your needs. Sometimes, though, you need to monitor metrics that are truly unique to your application.

Bind the DogStatsD port to a host port

To enable the collection of custom metrics, the Datadog Agent ships with a lightweight DogStatsD server for metric collection and aggregation. To send metrics to a containerized DogStatsD, you can bind the container’s port to the host port and address DogStatsD using the node’s IP address. To do that, add a hostPort to your dd-agent.yaml file:

        ports:
          - containerPort: 8125
            hostPort: 8125
            name: dogstatsdport
            protocol: UDP

This enables your applications to send metrics via DogStatsD on port 8125 on whichever node they happen to be running. To deploy the service, apply your change:

$ kubectl apply -f dd-agent.yaml

Pass the node’s IP address to your app

Since we’ve made it possible to send metrics via a known port on the host, now we need a reliable way for the application to determine the IP address of its host. This is made much simpler in Kubernetes 1.7, which expands the set of attributes you can pass to your pods as environment variables. In versions 1.7 and above, you can pass the host IP to any pod by adding an environment variable to the PodSpec. For instance, in our guestbook manifest, we’ve added:

        env:
        - name: DOGSTATSD_HOST_IP
          valueFrom:
            fieldRef:
              fieldPath: status.hostIP

Now any pod running the guestbook will be able to send DogStatsD metrics via port 8125 on $DOGSTATSD_HOST_IP.

Instrument your code to send metrics to DogStatsD

Now that we have an easy way to send metrics via DogStatsD on each node, we can instrument our application code to submit custom metrics. Since the guestbook example app is written in Go, we’ll import Datadog’s Go library, which provides a DogStatsD client library:

import "github.com/DataDog/datadog-go/statsd"

Before we can add custom counters, gauges, and more, we must initialize the StatsD client with the location of the DogStatsD service: $DOGSTATSD_HOST_IP.

func main() {

    // other main() code omitted for brevity

    var err error
    // use host IP and port to define endpoint
    dogstatsd, err = statsd.New(os.Getenv("DOGSTATSD_HOST_IP") + ":8125")
    if err != nil {
        log.Printf("Cannot get a DogStatsD client.")
    } else {
        // prefix every metric and event with the app name
        dogstatsd.Namespace = "guestbook."

        // post an event to Datadog at app startup
        dogstatsd.Event(&statsd.Event{
            Title: "Guestbook application started.",
            Text:  "Guestbook application started.",
        })
    }

We can also increment a custom metric for each of our handler functions. For example, every time the InfoHandler function is called, it will increment the guestbook.request_count metric by 1, while applying the tag endpoint:info to that data point:

func InfoHandler(rw http.ResponseWriter, req *http.Request) {
    dogstatsd.Incr("request_count", []string{"endpoint:info"}, 1)
    info := HandleError(masterPool.Get(0).Do("INFO")).([]byte)
    rw.Write(info)
}

Verify that custom metrics and events are being collected

If you kill one of your guestbook pods, Kubernetes will create a new one right away. As the new pod’s Go service starts, it will—if you’ve correctly configured its StatsD client—send a new “Guestbook application started” event to Datadog.

# Get the list of running pods
$ kubectl get pods
NAME                        READY     STATUS    RESTARTS   AGE
guestbook-231891302-2qw3m   1/1       Running   0          1d
guestbook-231891302-d9hkr   1/1       Running   0          1d
guestbook-231891302-wcmrj   1/1       Running   0          1d

# Kill one of those pods
$ kubectl delete pod guestbook-231891302-2qw3m
pod "guestbook-231891302-2qw3m" deleted

# Confirm that the deleted pod has been replaced
$ kubectl get pods
NAME                        READY     STATUS    RESTARTS   AGE
guestbook-231891302-d9hkr   1/1       Running   0          1d
guestbook-231891302-qhfgs   1/1       Running   0          48s
guestbook-231891302-wcmrj   1/1       Running   0          1d

Now you can view the Datadog event stream to see your custom application event, in context with the Docker events from your cluster.

Custom event from the guestbook app in the Datadog event stream
A custom app event in the Datadog event stream.

Generating metrics from your app

To cause your application to emit some metrics, visit its web interface and enter a few names into the guestbook. First you’ll need the public IP of your app, which is the external IP assigned to the guestbook service. You can find that EXTERNAL-IP by running:

kubectl get services

Now you can load the web app in a browser and add some names. Each request increments the request_count counter.

Web interface for a simple guestbook application written in Go

Now you should be able to view and graph the metrics from your guestbook app in Datadog. Open up a Datadog notebook and type in the custom metric name (guestbook.request_count) to start exploring. Success! You’re now successfully monitoring custom metrics from a containerized application.

Watching the orchestrator

In this guide we have stepped through several common techniques for setting up monitoring in a Kubernetes cluster:

  • Monitoring resource metrics and cluster status from Docker and Kubernetes
  • Using Autodiscovery to collect metrics from services with simple configs
  • Creating pod annotations to configure Autodiscovery for more complex use cases
  • Collecting custom metrics from containerized applications via DogStatsD

Stay tuned for forthcoming posts on container monitoring using other orchestration techniques and technologies.

If you’re ready to start monitoring your own Kubernetes cluster, you can sign up for and get started today.

Acknowledgments

Many thanks to Datadog technical writer Kent Shultz for his technical contributions and advice throughout the development of this article.


Want to write articles like this one? Our team is hiring!