ElastiCache Dashboard (Redis) | Datadog

ElastiCache Dashboard (Redis)

Visualize your own Amazon ElastiCache data in minutes with Datadog's out-of-the-box dashboard.

dashboard/dashboard-header-ElasticacheRedisv3.png

Amazon ElastiCache dashboard overview

An efficient cache can significantly increase your application’s performance and user navigation speed. That’s why understanding the metrics that contribute to your cache health is essential.

Amazon ElastiCache Dashboard
Datadog's out-of-the-box Amazon ElastiCache dashboard.

Datadog’s Amazon ElastiCache dashboard visualizes host-level, throughput, performance, and memory metrics in a single pane of glass. This page breaks down the metrics featured on that dashboard to provide a starting point for anyone looking to monitor ElasticCache using the Redis caching engine.

What is Amazon ElastiCache?

Amazon ElastiCache is a fully managed in-memory cache service offered through Amazon Web Services (AWS). Using a cache greatly improves throughput and reduces latency of read-intensive workloads.

AWS allows you to choose between Redis and Memcached as ElastiCache’s caching engine. Since it’s the most widely used caching engine, we will present the metrics best suited for an ElastiCache dashboard when using Redis below.

See real-time Amazon ElastiCache data in minutes with Datadog's out-of-the-box dashboard.

Amazon ElastiCache dashboard metrics breakdown

Host-level metrics

CPU utilization by node

It is important to track CPU utilization for each node because this metric can indirectly indicate high latency. AWS recommends that you set an alert threshold of 90 percent divided by the number of cores for the metric.

Network incoming bytes by node

The number of bytes read from the network by the host.

Network outgoing bytes by node

The number of bytes written to the network by the host.

Throughput and performance metrics

Connections by cluster

The number of current client connections to the cache. It is important to alert on this metric to make sure it never reaches the connections limit.

Hit Rate (%) by cluster

The hit rate is a measure of your cache efficiency and a calculation of cache hits and misses: (hits / (hits+misses). Cache hits are the number of requested files that were served from the cache without requesting to the backend. Misses are the number of times a request was answered by the backend because the item was not cached.

If your hit rate is too low, the cache’s size might be too small for the working data set. A high hit rate, on the other hand, helps to reduce your application response time, ensure a smooth user experience, and protect your databases.

Get commands by cluster

The number of Get commands received by your ElastiCache cluster.

Set commands by cluster

The number of Set commands received by your ElastiCache cluster.

Get commands by node

The number of Get commands received by your ElastiCache nodes.

Set commands by node

The number of Set commands received by your ElastiCache nodes.

When monitored alongside Get commands by node, you can check if all your nodes are healthy, and if the traffic is well-balanced among nodes.

Replication lag by node

This metric tracks the time taken for a cache replica to update changes made in the primary cluster. Monitoring this metric helps to ensure that you’re not serving stale data.

See real-time Amazon ElastiCache data in minutes with Datadog's out-of-the-box dashboard.

Memory metrics

Memory usage by node

Memory usage is critical for your cache performance. If it exceeds the total available system memory, the OS will start swapping old or unused sections of memory.

Memory usage by cluster

This metric reports memory usage, but this time per cluster.

Available system memory by node

This tracks the remaining memory on each host, which shouldn’t be too low, otherwise it can lead to swap usage.

Available system memory by cluster

This metric provides the memory remaining at the cluster level.

Swap usage by cluster

The SwapUsage host-level metric indicates when your system will run out of memory and the operating system will starts using disk to hold data that should be in memory.

Evictions by cluster

Evictions happen when the cache memory usage limit (maxmemory with Redis) is reached and the cache engine has to remove items to make space for new writes.

Evicting a large number of keys can decrease your hit rate, leading to longer latency times. If the number of evictions is growing, you should increase your cache size by migrating to a larger node type.

ElastiCache events

It is also important to track ElastiCache events, such as node addition failure or cluster creation. When correlated with ElastiCache metrics, these events can help investigate cache cluster activity.

Set up an Amazon ElastiCache dashboard in minutes with Datadog

If you’d like to start visualizing your Amazon ElastiCache data with our out-of-the-box dashboard, you can . The ElastiCache dashboard will be populated immediately after you set up the Amazon ElastiCache integration.

For a deep dive on ElastiCache metrics and how to monitor them, check out our three-part How to Monitor ElastiCache series.