The Service Map for APM is here!

Amazon Aurora Dashboard

<

What is Amazon Aurora?

Amazon Aurora is a MySQL-compatible database offered on the Relational Database Service (RDS). Aurora offers unique features such as auto-scaling storage, extremely low-latency replication, and rapid automated failover to a standby instance.

In comparison to the original MySQL database engine on RDS, Aurora introduces some performance enhancements and offers an expanded suite of monitoring metrics.

Amazon Aurora dashboard overview

Similar to RDS MySQL users, Aurora users can access high-level RDS metrics through Amazon CloudWatch as well as hundreds of metrics directly from the MySQL-compatible database engine. While standard CloudWatch Aurora metrics are available at one-minute intervals, database engine metrics can be collected at an even higher resolution.

Ideally, both RDS and Aurora metrics should be collected for a comprehensive view. However, there are so many metrics available that it can be difficult to decide which ones to focus on.

Refer to the image below for an example of a customizable Amazon Aurora dashboard in Datadog with the critical metrics you should focus on. Whether or not you use Datadog user, these metrics provide a template for building a comprehensive Aurora dashboard.

The following is a widget-by-widget breakdown of the sample Amazon Aurora dashboard separated into four categories: query volume, disk I/O, connection & replication, and AWS resource metrics.

Amazon Aurora template dashboard

Query volume

Queries per second

The first priority for monitoring the Aurora database engine should be making sure that queries are being executed. This graph has a corresponding counter to help track the rise and fall of query rate.

It is important to alert on this metric because sudden changes in query volume, especially drastic drops in throughput, can indicate a serious problem.

SELECT queries per second

A CloudWatch metric that monitors the volume of SELECT statements. This metric corresponds to the reads served by the database engine.

DML (INSERT/UPDATE/DELETE) queries per second

A CloudWatch metric monitoring the current rate of DML requests (inserts, updates, and deletes) that are rolled into the DMLThroughput metric. This metric corresponds to the writes served by the database engine.

Slow queries

Aurora supports this MySQL metric that increments every time a query’s execution time exceeds the number of seconds specified by the long_query_time parameter (configurable in the AWS console).

SELECT latency

This is the latency per read query—a metric unique to Aurora. Read latency, along with query volume, should be among the top metrics monitored for almost any use case.

DML (INSERT/UPDATE/DELETE) latency

This is the write query latency. It is important to alert on both read and write latency because any slow reads or writes will necessarily add latency to any application that relies on Aurora.

Disk I/O metrics

Total IOPS

CloudWatch makes available RDS metrics on read and write IOPS, which indicate how much your database is interacting with backing storage. This graph tracks the total number of I/O operations handled by the disk.

Read IOPS

The ReadIOPS metric in CloudWatch provides the number of read I/O operations per second.

Write IOPS

The WriteIOPS metric in CloudWatch provides the number of write I/O operations per second.

Disk queue depth

When storage volumes cannot keep pace with the volume of read and write requests, I/O operations begin queuing up. The DiskQueueDepth metric measures the length of this queue at any given moment.

Read latency per I/O

This RDS metric measures how long read I/O operations take at the disk level.

Write latency per I/O

This RDS metric measures how long write I/O operations take at the disk level.

Connection & replication

Threads connected

The Threads_connected MySQL metric monitors the total number of open database connections. It is important to alert on this because if a client attempts to connect to Aurora when all available connections are in use, Aurora will refuse it and return a “Too many connections” error.

Threads running

This MySQL metric provides additional visibility for the Threads_connected metric by isolating the threads that are actively processing queries.

Replication lag

The AuroraReplicaLag metric is an Aurora-specific option for monitoring lag time for any read replica.

This is different from the generic RDS ReplicaLag metric because the AuroraReplicaLag metric tracks the lag in page cache updates from primary to replica rather than the lag in applying all write operations from the primary instance to the replica.

AWS resource metrics

CPU

This graph tracks CPU utilization as a percentage. High CPU utilization is not necessarily a bad sign, but the CPUs of your chosen instance type may be the bottleneck if IOPS and network metrics are in normal ranges and there appears to be sufficient memory.

Network in

The NetworkReceiveThroughput metric tracks network traffic sent from clients. Unlike other RDS database engines, Aurora’s network metric does not include network traffic from the database instances to the storage volumes.

Network out

The NetworkTransmitThroughput metric tracks network traffic being sent to clients. Unlike other RDS database engines, Aurora’s network metric does not include network traffic from the database instances to the storage volumes.

Monitor Amazon Aurora with Datadog

If you’d like to see this dashboard for your Amazon Aurora metrics, you can try Datadog for free for 14 days. This customizable dashboard will be populated immediately after you set up the Aurora integration.

For a deep dive on Amazon Aurora metrics and how to monitor them, check out our three-part How to Monitor Amazon Aurora series.

 

Amazon Aurora Dashboard