Collecting ElastiCache metrics + its Redis/Memcached metrics

Jean-Mathieu Saponaro

This post is part 2 of a 3-part series on monitoring Amazon ElastiCache metrics. Part 1 explores its key performance metrics, and Part 3 describes how Coursera monitors ElastiCache metrics.

Many ElastiCache metrics can be collected from AWS via CloudWatch or directly from the cache engine, whether Redis or Memcached. When that’s the case, as discussed in Part 1, you should favor monitoring the native cache metric to ensure higher resolution and greater awareness and responsiveness. Therefore this article covers three different ways to access ElastiCache metrics from AWS CloudWatch, as well as the collection of native metrics from both caching engines:

CloudWatch metrics
Caching engine metrics
- Redis
- Memcached

Using the AWS Management Console

Using the online management console is the simplest way to monitor your cache with CloudWatch. It allows you to set up basic automated alerts and to get a visual picture of recent changes in individual metrics. Of course, you won’t be able to access native metrics from your cache engine, but their CloudWatch equivalent is sometimes available (see Part 1 ).

Graphs

Once you are signed in to your AWS account, you can open the CloudWatch console and then browse the metrics related to the different AWS services.

By clicking on the ElastiCache Metrics category, you will see the list of available metrics:

You can also view these metrics per cache cluster:

Just select the checkbox next to the metrics you want to visualize, and they will appear in the graph at the bottom of the console:

Alerts

With the CloudWatch Management Console you can also create simple alerts that trigger when a metric crosses a specified threshold.

Click on the “Create Alarm” button at the right of your graph, and you will be able to set up the alert and configure it to notify a list of email addresses.

Using the AWS Command Line Interface

You can also retrieve metrics related to your cache from the command line. First you will need to install the AWS Command Line Interface (CLI) by following these instructions. You will then be able to query for any CloudWatch metric, using different filters.

Command line queries can be useful for spot checks and ad hoc investigations when you can’t, or don’t want to, use a browser.

For example, if you want to know the CPU utilization statistics for a cache cluster, you can use the CloudWatch command get-metric-statistics with the parameters you need:

(on Linux)

1
aws cloudwatch get-metric-statistics --namespace AWS/ElastiCache --metric-name CPUUtilization --dimensions="Name=CacheClusterId,Value=yourcachecluster" --statistics=Average --start-time 2019-10-02T00:00:00 --end-time 2019-10-02T20:00:00 --period=60

Here are all the commands you can run with the AWS CLI.

Monitoring tool integrated with CloudWatch

The third way to collect CloudWatch metrics is via a dedicated monitoring tool that offers extended monitoring functionality, such as:

Correlation of CloudWatch metrics with metrics from the caching engine and from other parts of your infrastructure
Dynamic slicing, aggregation, and filters on metrics
Historical data access
Sophisticated alerting mechanisms

CloudWatch can be integrated with outside monitoring systems via API, and in many cases the integration only needs to be enabled once to deliver metrics from all your AWS services.

Collecting native Redis or Memcached metrics

CloudWatch’s ElastiCache metrics can give you good insight about your cache’s health and performance. However, as explained in Part 1, supplementing CloudWatch metrics with native cache metrics provides a fuller picture with higher-resolution data.

Redis

Redis provides extensive monitoring out of the box. The info command in the Redis command line interface gives you a snapshot of current cache performance. If you want to dig deeper, Redis also provides a number of tools offering a more detailed look at specific metrics. You will find all the information you need in our recent post about collecting Redis metrics.

For spot-checking the health of your server or looking into causes of significant latency, Redis’s built-in tools offer good insights.

However, with so many metrics exposed, getting the information you want all in one place can be a challenge. Moreover, accessing data history and correlating Redis metrics with metrics from other parts of your infrastructure can be essential. That’s why using a monitoring tool integrating with Redis, such as Datadog, will help to take the pain out of your monitoring work.

Memcached

Memcached is more limited than Redis when it comes to monitoring. The most useful tool is the stats command, which returns a snapshot of Memcached metrics. Here is an example of its output:

1
stats
2

3
  STAT pid 14868
4
  STAT uptime 175931
5
  STAT time 1220540125
6
  STAT version 1.2.2
7
  STAT pointer_size 32
8
  STAT rusage_user 620.299700
9
  STAT rusage_system 1545.703017
10
  STAT curr_items 228
11
  STAT total_items 779
12
  STAT bytes 15525
13
  STAT curr_connections 92
14
  STAT total_connections 1740
15
  STAT connection_structures 165
16
  STAT cmd_get 7411
17
  STAT cmd_set 28445156
18
  STAT get_hits 5183
19
  STAT get_misses 2228
20
  STAT evictions 0
21
  STAT bytes_read 2112768087
22
  STAT bytes_written 1000038245
23
  STAT limit_maxbytes 52428800
24
  STAT threads 1
25
  END

If you need more details about the commands you can run with Memcached, you can check their documentation on Github.

Obviously, you can’t rely only on this snapshot to properly monitor Memcached performance; it tells you nothing about historical values or acceptable bounds, and it is not easy to quickly digest and understand the raw data. From a devops perspective, Memcached is largely a black box, and it becomes even more complex if you run multiple or distributed instances. Other basic tools like memcache-top (for a changing, real-time snapshot) are useful but remain very limited.

Thus if you are using Memcached as your ElastiCache engine, like Coursera does (see Part 3), you should use CloudWatch or a dedicated monitoring tool that integrates with Memcached, such as Datadog.

Conclusion

In this post we have walked through how to use CloudWatch to collect, visualize, and alert on ElastiCache metrics, as well as how to access higher-resolution, native cache metrics from Redis or Memcached.

In the next and final part of this series we take you behind the scenes with Coursera’s engineering team to learn their best practices and tips for using ElastiCache and monitoring its performance with Datadog.

Get Started with Datadog