
Jean-Mathieu Saponaro
This post is part 2 of a 3-part series on Varnish monitoring. Part 1 explores the key Varnish metrics available, and Part 3 details how Datadog can help you to monitor Varnish metrics.
How to get the Varnish metrics you need
Varnish Cache ships with very useful and precise monitoring and logging tools. As explained in the first post of this series, for monitoring purposes, the most useful of the available tools is varnishstat
which gives you a detailed snapshot of Varnish’s current performance. It provides access to in-memory statistics such as cache hits and misses, resource consumption, threads created, and more.
varnishstat
If you run varnishstat
from the command line you will see a list of all available Varnish metrics, with values changing in real time. If you add the -1
flag, varnishstat will exit after printing the list one time. Example output below:
$ varnishstat
MAIN.uptime Child process uptime MAIN.sess_conn Sessions accepted MAIN.sess_drop Sessions dropped MAIN.sess_fail Session accept failures MAIN.sess_pipe_overflow Session pipe overflow MAIN.client_req Good client requests received MAIN.cache_hit Cache hits MAIN.cache_hitpass Cache hits for pass MAIN.cache_miss Cache misses MAIN.backend_conn Backend conn. success MAIN.backend_unhealthy Backend conn. not attempted MAIN.backend_busy Backend conn. too many MAIN.backend_fail Backend conn. failures MAIN.backend_reuse Backend conn. reuses MAIN.backend_toolate Backend conn. was closed MAIN.backend_recycle Backend conn. recycles MAIN.backend_retry Backend conn. retry MAIN.pools Number of thread pools MAIN.threads Total number of threads MAIN.threads_limited Threads hit max MAIN.threads_created Threads created MAIN.threads_destroyed Threads destroyed MAIN.threads_failed Thread creation failed MAIN.thread_queue_len Length of session queue
To list specific values, pass them with the -f
flag, separated by commas (and followed by -1 if needed).
For instance, to display the number of threads currently being used, run: varnishstat -f MAIN.threads

Varnishstat is useful as a standalone tool if you need to spot-check the health of your cache. However, if Varnish is an important part of your software service, you will almost certainly want to graph its performance over time, correlate it with other metrics from across your infrastructure, and be alerted about any problems that may arise. To do this you will probably want to integrate the metrics that Varnishstat is reporting with a dedicated monitoring service.
varnishlog
If you need to debug your system or tune configuration, varnishlog
can be a useful tool, as it provides detailed information about each individual request.
Here is an edited example of varnishlog
output generated by a single request—a full example would be several times longer:
$ varnishlog
3727 RxRequest c GET 3727 RxProtocol c HTTP/1.1 3727 RxHeader c Content-Type: application/x-www-form-urlencoded; 3727 RxHeader c Accept-Encoding: gzip,deflate,sdch 3727 RxHeader c Accept-Language: en-US,en;q=0.8 3727 VCL_return c hit 3727 ObjProtocol c HTTP/1.1 3727 TxProtocol c HTTP/1.1 3727 TxStatus c 200 3727 Length c 316 […]
The 4 columns represent:
varnishlog’s children
You can display a subset of varnishlog
’s information via three specialized tools built on top of varnishlog:
varnishtop
exposes the log entries that occur most often. You can filter to show the most frequently requested documents, the most common clients or user agents, or other data.varnishhist
returns a histogram of latency for recent requests.varnishsizes
returns a histogram of request size for recent requests.
Conclusion
Which metrics you monitor will depend on your use case, the tools available to you, and whether the insight provided by a given metric justifies the overhead of monitoring it.
At Datadog, we have built an integration with Varnish so that you can begin collecting and monitoring its metrics with a minimum of setup. Learn how Datadog can help you to monitor Varnish in the next and final part of this series of articles.
Source Markdown for this post is available on GitHub. Questions, corrections, additions, etc.? Please let us know.