Varnish Cache is a web application accelerator designed specifically for content-rich, dynamic websites and heavily-used APIs. The strategy it uses for acceleration is known as a “caching HTTP reverse proxying”. Let’s unpack these terms.
As a reverse proxy, Varnish is server-side, as opposed as a client-side forward proxy. It acts as an invisible conduit between a client and a backend, intermediating all communications between the two. As a cache, it stores often-used assets (such as files, images, css) for faster retrieval and response without hitting the backend. Unlike other caching reverse proxies, which may support FTP, SMTP, or other network protocols, Varnish is exclusively focused on HTTP. As a caching HTTP proxy, Varnish also differs from browser-based HTTP proxies in that it can cache reusable assets between different clients, and cached objects can be invalidated everywhere simultaneously.
Varnish is a mature technology, and is in use at many high-traffic websites such as The New York Times, Wikipedia, Tumblr, Twitter, Vimeo, and Facebook.
When running well, Varnish Cache can speed up information delivery by a factor of several hundred. However, if Varnish is not tuned and working properly, it can slow down or even halt responses from your website. The best way to ensure the proper operation and performance of Varnish is by monitoring its key performance metrics in the following areas:
- Client metrics: client connections and requests
- Cache performance: cache hits, evictions
- Thread metrics: thread creation, failures, queues
- Backend metrics: success, failure, and health of backend connections
This article references metric terminology introduced in our Monitoring 101 series, which provides a framework for metric collection and alerting.
NOTE: All the metrics discussed here can be collected from the varnishstat command line, and use the metric names from the latest version, Varnish 4.0.
Client metrics measure volume and success of client connections and requests. Below we discuss some of the most important.
|Cumulative number of accepted client connections by Varnish Cache
|Cumulative number of received client requests. Increments after a request is received, but before Varnish responds
|Number of connections dropped due to a full queue
|Work: Error (due to resource saturation)
client_req: Regularly sampling the number of requests per second allows you to calculate the number of requests you’re receiving per unit of time—typically minutes or seconds. Monitoring this metric can alert you to spikes in incoming web traffic, whether legitimate or nefarious, or sudden drops, which are usually indicative of problems. A drastic change in requests per second can alert you to problems brewing somewhere in your environment, even if it cannot immediately identify the cause of those problems. Note that all requests are counted the same, regardless of their URLs.
sess_dropped: Once Varnish is out of worker threads, each new request is queued up and
sess_queuedis incremented. When this queue fills up, additional incoming requests will simply be dropped without being answered by Varnish and
sess_droppedwill be incremented. If this metric is not equal to zero, then either Varnish is overloaded, or the thread pool is too small in which case you should try gradually increasing
thread_pool_maxand see if it fixes the issue without causing higher latency or other problems.
Note that, for historical reasons, there is a
sess_drop metric present in Varnish which is not the same as
sess_dropped, discussed above. In new versions of Varnish,
sess_drop is never incremented so it does not need to be monitored.
Varnish is a cache, so by measuring cache performance you can see instantly how well Varnish is doing its work.
The diagram below illustrates how Varnish routes requests, and when each of its cache hit metrics is incremented.
|Cumulative number of times a file was served from Varnish’s cache
|Cumulative number of times a file was requested but was not in the cache, and was therefore requested from the backend
|Cumulative number of hits for a “pass” file
Some objects cannot be cached. When one of them is requested for the first time, the response is served by the backend and
cache_hitpass is incremented. This “uncachable” object is recorded so that subsequent requests for it go directly to “pass” without being counted as misses.
The cache hit rate is the ratio of cache hits to total cache lookups:
cache_hit / (cache_hit + cache_miss). This derived metric provides visibility into the effectiveness of the cache. The higher the ratio, the better. If the hit rate is consistently high, above 0.7 (70 percent) for instance, then the majority of requests are successfully expedited through caching. If the cache is not answering a sufficient percentage of the read requests, consider increasing its memory, which can be a low-overhead tactic for improving read latency.
If after increasing the amount of memory available to your cache, your hit rate is still too low, you might also want to look at which objects are not being cached and why. For this you’ll need to use Varnishlog and then optimize your VCL (Varnish Configuration Language) tuning to improve the hit/miss ratio.
|Cumulative number of expired objects for example due to TTL
|Least Recently Used Nuked Objects: Cumulative number of cached objects that Varnish has evicted from the cache because of a lack of space
The LRU (Least Recently Used) Nuked Objects counter,
n_lru_nuked, should be closely watched. If the eviction rate is increasing, that means your cache is evicting objects faster and faster due to a lack of space. In this case you may want to consider increasing the cache size.
Metrics related to worker threads tell you if your thread pools are healthy and functioning well.
|Number of threads in all pools
|Number of times a thread has been created
|Number of times that Varnish unsuccessfully tried to create a thread
|Number of times a thread needed to be created but couldn't because varnishd maxed out its configured capacity for new threads
|Current queue length: number of requests waiting on worker thread to become available
|Number of times Varnish has been out of threads and had to queue up a request
Keep an eye on the metric
thread_queue_len which should not be too high. If it’s not equal to zero, that means Varnish is saturated and responses are slowed.
These metrics should always be equal to 0:
threads_failed: otherwise you have likely exceeded your server limits, or attempted to create threads too rapidly. The latter case usually occurs right after Varnish is started, and can be corrected by increasing the
threads_limited: otherwise you should consider increasing the value of
Keeping an eye on the state of your connections with backend web servers is also crucial to understand how well Varnish is able to do its work.
|Cumulative number of successful TCP connections to the backend
|Cumulative number of current backend connections which were put back to a pool of keep-alive connections and have not yet been used
|Cumulative number of connections that were reused from the keep-alive pool
|Cumulative number of backend connections that have been closed because they were idle for too long
|Cumulative number of failed connections to the backend
|Work: Error (due to resource error)
|Cumulative number of backend connections which were not attempted because the backend has been marked as unhealthy
|Cumulative number of times the maximum amount of connections to the backend has been reached
|Number of requests to the backend
If your backend has keep-alive set, Varnish will use a pool of connections. You can get some insight into the effectiveness of the connection pool by looking at
By default when
backend_busy is incremented, that means the client receives a 5xx error response. However by using VCL, you can configure Varnish to recover from a busy backend by using a different backend, or by serving an outdated or synthetic response.
backend_req, should be monitored to detect network or cache performance issues.
backend_fail(backend connection failures) should be 0 or very close to 0. Backend connection failures can have several root causes:
- Initial (TCP) connection timeout: usually results from network issues, but could also be due to an overloaded or unresponsive backend
- Time to first byte: when a request is sent to the backend and it does not start responding within a certain amount of time
- Time in between bytes: when the backend started streaming a response but stopped sending data without closing the connection
backend_unhealthy: Varnish periodically pings the backend to make sure it is still up and responsive. If it doesn’t receive a 200 response quickly enough, the backend is marked as unhealthy and every new request to it increments this counter until the backend recovers and sends a timely 200 response.
If you are using Edge Side Includes,
esi_warnings will give you insight into the validity of your ESI syntax. If these metrics are increasing, you should inspect what is being returned by your backend and fix the errors you find.
In this post we’ve explored the most important metrics you should monitor to keep tabs on your Varnish cache. If you are just getting started with Varnish, monitoring the metrics listed below will give you great insight into your cache’s health and performance. Most importantly it will help you identify areas where tuning could provide significant benefits.
- Requests per second
- Dropped client connections
- Cache hit rate
- LRU Nuked objects
- Some worker thread related metrics
- Backend connection failures or unhealthy backend
Eventually you will recognize additional, more specialized metrics that are particularly relevant to your own environment and use cases.
Of course, what you monitor will depend on the tools you have and the metrics available.
Part 2 of this post provides step-by-step instructions for collecting these metrics from Varnish.