StatsD, what it is and how it can help you
In less than 3 years since it was first introduced, StatsD has emerged as one of the most popular—and useful—parts of the modern devops toolchain. Here’s why…
What is StatsD exactly?
StatsD is originally a simple daemon developed and released by Etsy to aggregate and summarize application metrics. With StatsD, applications are to be instrumented by developers using language-specific client libraries. These libraries will then communicate with the StatsD daemon using its dead-simple protocol, and the daemon will then generate aggregate metrics and relay them to virtually any graphing or monitoring backend.
The rest, as they say, is history. StatsD quickly grew in popularity, to a point where it really became a unifying protocol for application metrics collection—of which the Etsy Daemon was only a reference implementation.
How does StatsD work?
- It all starts in your own application code. You—the developer—instrument it with one of the many StatsD libraries corresponding to your app language. StatsD allows you to capture different types of metrics depending on your needs: today those are Gauges, Counters, Timing Summary Statistics, and Sets. This can be as simple as adding a decorator to methods you want to time, or a one-liner to track a gauge value.
- The StatsD client library then sends each individual call to the StatsD server over a UDP datagram. Since UDP is a disconnected protocol in which the recipient of a datagram doesn’t send any acknowledgement to the sender, the library doesn’t need to block when submitting data as it would with TCP or HTTP-based protocols. The library also doesn’t buffer any data in-between calls which keeps it very simple. It does let you optionally sample the events to be sent to the server if you happen to instrument very high-throughput operations.
- The StatsD daemon will then listen to the UDP traffic from all application libraries, aggregate data over time and “flush” it at the desired interval to the backend of your choice. For example, individual function call timings may be aggregated every 10 seconds into a set of summary metrics describing its minimum, maximum, median, 90th and 95th percentile over the 10s interval. The protocol used between the StatsD Daemon and the backend will vary depending on the backend used (most are HTTP-based).
- The monitoring backend will turn your metrics from a stream of numbers on the wire into usable charts and alert you when needed. Examples of backends include tools like Graphite as well as yours truly.
What sets StatsD apart from the rest?
There have been and still are many alternative methods for capturing metrics, one of the most popular ones today for Java applications being the excellent Coda Hale’s Metrics library.
Here’s what sets StatsD apart today:
- Simplicity: Not only is it very easy to instrument your app, the StatsD protocol is text-based and straightforward to write and read. The original Etsy server code was a mere 127 lines long.
- Decoupling the application from its instrumentation: Because the daemon runs outside the app and UDP is a fire-and-forget protocol, there’s no upstream dependency between metrics collection and the app itself. StatsD can’t crash your app, and doesn’t need to be written in the same language or even run on the same machine.
- Tiny footprint: StatsD clients are extremely thin, carry no state, need no threads and add negligible overhead.StatsD also support sampling your calls to arbitrarily reduce network utilization.
- Ubiquity and ecosystem: There are StatsD clients for Ruby, Python, Java, Erlang, Node, Scala, Go, Haskell, and virtually every other language. Many developers wrote alternative servers to fit special needs or maximize throughput. And, there’s a plethora of backends supporting it, both open source and commercial. This means no vendor lock-in.
What problem does StatsD solve?
Beyond the technical problem it solves—getting data from point A to point B efficiently—StatsD’s biggest contributions are organizational in nature. It allows for a culture where developers don’t have to ask anyone’s permission to instrument their application, where metrics are captured before applications are deployed in production, and where abstract performance or resource utilization metrics can be directly linked to application or product metrics that are directly relevant to the business.
We often get asked how one should “implement devops”? It’s usually a long answer, but a lot of it has to do with Dev and Ops teams sharing the ownership of their application’s availability and performance, and StatsD enables just that.
StatsD & Datadog
You may have guessed by now that we’re big fans of StatsD and use it extensively internally. We also wanted to make it really easy for our customers to submit metrics from StatsD into Datadog for graphing, alerting, event correlation, and team collaboration:
- We embedded our own StatsD daemon within the Datadog Agent, to make the setup as simple as possible while keeping it a drop-in StatsD replacement (see the source)
- We extended the StatsD protocol to support tagging, one of Datadog’s killer features. This lets you add additional dimensions to your metrics, such as the application version, or type of customer a specific call relates to. But, we’ll come back to this in another post.
- We made it very easy to discover StatsD metrics in the Datadog UI. Every host will automatically advertise its metrics, so you don’t have to look for them.