What is Redis?
Redis is an open source in-memory key/value data store. Due to its performance and simple onboarding, Redis has become a popular solution for multiple industries in use cases such as:
- Database: Focus on speed over durability with rich data primitives
- Message queue: A good backend for message broker service because of blocking list commands and low latency
- Memory cache: Configurable key eviction policies make Redis a great choice as a cache server
Redis dashboard overview
When you’re setting up a Redis monitoring dashboard, there are two key problems you are trying to catch—resource issues with Redis itself and problems arising elsewhere in your supporting infrastructure. There are many necessary Redis metrics to track and it’s easiest to think of them in five separate categories: performance metrics, memory metrics, basic activity metrics, persistence metrics, and error metrics.
Refer to the image below for an example of a customizable Redis dashboard in Datadog with the critical metrics you’ll want to monitor. Whether you’re a Datadog user or not, this layout can serve as a useful template when assembling a comprehensive Redis dashboard.
The following is a widget-by-widget breakdown of the graphs and query values in the Redis dashboard, separated into the five metric categories.
Redis Performance metrics
Commands per second
When you experience high latency, the commands per second graph provides insight that helps with diagnostics. If commands per second is stable, you know that the latency issues are not caused by computationally-intensive commands. However, if one or more slow connections are causing latency issues, the number of commands per second would drop or stall completely on the graph.
Cache Hit rate
This metric is especially important when using Redis as a memory cache. The query value indicates whether your cache is being used effectively or not.
A low hit rate means that clients are looking for keys that don’t exist. Possible causes include data expiration and insufficient memory allocated to Redis, which can increase latency of applications because they have to fetch data from slower, alternative resources.
The slowlog is a running list of executed commands which exceed a specific run time.
Diving into the slowlog provides a low-level view of the commands that are causing increases in latency.
Latency (in ms)
Latency refers to the measurement of the time it takes between a client request and the actual server response.
Because Redis is single-threaded, outliers in your latency distribution could cause serious bottlenecks for users.
Redis Memory metrics
This metric is especially important for the memory cache use case. In this case, you can configure Redis to automatically purge keys when the
maxmemory limit is reached. Swapping may be preferable to evictions when using Redis as a database or queue.
Evicting a large number of keys can lead to lower hit rates and, thus, longer latency times.
This is the ratio of memory used by Redis to memory allocated to Redis. The operating system’s memory allocator will first attempt to find a contiguous memory segment to store the data for a process. If no contiguous segment is found, the allocator must divide the process’s data across segments, leading to increased memory overhead.
A fragmentation ratio greater than 1 indicates that fragmentation is occurring. A fragmentation ratio below 1 tells you that Redis needs more memory than is available on your system, which leads to swapping. Ideally, the operating system would allocate a contiguous segment in physical memory, with a fragmentation ratio equal to 1 or slightly greater.
Redis offers blocking variants of its LPOP, RPOP, and RPOPLPUSH commands—BLPOP, BRPOP, and BRPOPLPUSH. When the source list is non-empty, the commands perform as you expect. However, when the source list is empty the blocking commands wait until the source is filled or a timeout is reached.
If this metric is consistently a non-zero value, you should investigate potential issues, such as latency, that prevent the source from being filled.
used_memory exceeds the total available system memory, the operating system will begin swapping old/unused sections of memory. Every swapped section is written to disk, severely impacting performance.
Tracking used memory is important because writing/reading from the disk is up to five orders of magnitude slower than reading/writing to memory.
Redis basic activity metrics
If you are using the master-slave database replication features available in Redis, monitoring the number of connected slaves is key. Should the number of connected slaves change unexpectedly, it could indicate a down host or problem with the slave instance.
This is the number of connections rejected due to hitting the
maxclient limit for the server instance. When this limit is reached, any new connection attempts will be ignored until the number of connections is below the maximum.
There will usually be reasonable upper and lower bounds for the number of connected clients because access to Redis is most often mediated by an application.
A low number of connected clients could indicate lost upstream connections. On the other hand, high numbers of connected clients indicate many concurrent client connections that can overwhelm the server’s ability to handle requests.
Redis keyspace widget and keys graph
The larger the keyspace, the more physical memory Redis requires to ensure optimal performance. Redis continues to add keys until it reaches the
maxmemory limit, at which point it begins evicting keys at the same rate new ones come in.
A “flatlining” keyspace metric can be correlated with hit rate in a cache use case to indicate clients requesting old or evicted data. In database or queue use cases, you can add more memory to your box or split datasets across hosts as your keyspace grows.
Redis persistence metrics
Unsaved changes widget
This metric is often used in conjunction with
db_last_save_time to provide insight into your data volatility. Long time intervals in this metric are acceptable if your data set hasn’t changed much. However, when tracked with last save time data, you can understand how much data would be lost should a server fail.
Redis error metrics
Master down widget
This metric is only available when the connection between a master and its slave has been lost. The master and slave should be in constant communication to ensure the slave is not serving up stale data.
In light of this constant communication, this metric should never exceed zero on your dashboard.
Monitor your database, message queue, or memory cache with the Redis dashboard
For a deep dive on Redis metrics and how to monitor them, check out our three-part How to Monitor Redis series.