Amazon Aurora Dashboard
What is Amazon Aurora?
Amazon Aurora is a MySQL-compatible database offered on the Relational Database Service (RDS). Aurora offers unique features such as auto-scaling storage, extremely low-latency replication, and rapid automated failover to a standby instance.
In comparison to the original MySQL database engine on RDS, Aurora introduces some performance enhancements and offers an expanded suite of monitoring metrics.
Amazon Aurora dashboard overview
Similar to RDS MySQL users, Aurora users can access high-level RDS metrics through Amazon CloudWatch as well as hundreds of metrics directly from the MySQL-compatible database engine. While standard CloudWatch Aurora metrics are available at one-minute intervals, database engine metrics can be collected at an even higher resolution.
Ideally, both RDS and Aurora metrics should be collected for a comprehensive view. However, there are so many metrics available that it can be difficult to decide which ones to focus on.
Refer to the image below for an example of a customizable Amazon Aurora dashboard in Datadog with the critical metrics you should focus on. Whether or not you use Datadog user, these metrics provide a template for building a comprehensive Aurora dashboard.
The following is a widget-by-widget breakdown of the sample Amazon Aurora dashboard separated into four categories: query volume, disk I/O, connection & replication, and AWS resource metrics.
Queries per second
The first priority for monitoring the Aurora database engine should be making sure that queries are being executed. This graph has a corresponding counter to help track the rise and fall of query rate.
It is important to alert on this metric because sudden changes in query volume, especially drastic drops in throughput, can indicate a serious problem.
SELECT queries per second
A CloudWatch metric that monitors the volume of SELECT statements. This metric corresponds to the reads served by the database engine.
DML (INSERT/UPDATE/DELETE) queries per second
A CloudWatch metric monitoring the current rate of DML requests (inserts, updates, and deletes) that are rolled into the
DMLThroughput metric. This metric corresponds to the writes served by the database engine.
Aurora supports this MySQL metric that increments every time a query’s execution time exceeds the number of seconds specified by the
long_query_time parameter (configurable in the AWS console).
This is the latency per read query—a metric unique to Aurora. Read latency, along with query volume, should be among the top metrics monitored for almost any use case.
DML (INSERT/UPDATE/DELETE) latency
This is the write query latency. It is important to alert on both read and write latency because any slow reads or writes will necessarily add latency to any application that relies on Aurora.
Disk I/O metrics
CloudWatch makes available RDS metrics on read and write IOPS, which indicate how much your database is interacting with backing storage. This graph tracks the total number of I/O operations handled by the disk.
The ReadIOPS metric in CloudWatch provides the number of read I/O operations per second.
The WriteIOPS metric in CloudWatch provides the number of write I/O operations per second.
Disk queue depth
When storage volumes cannot keep pace with the volume of read and write requests, I/O operations begin queuing up. The
DiskQueueDepth metric measures the length of this queue at any given moment.
Read latency per I/O
This RDS metric measures how long read I/O operations take at the disk level.
Write latency per I/O
This RDS metric measures how long write I/O operations take at the disk level.
Connection & replication
Threads_connected MySQL metric monitors the total number of open database connections. It is important to alert on this because if a client attempts to connect to Aurora when all available connections are in use, Aurora will refuse it and return a “Too many connections” error.
This MySQL metric provides additional visibility for the Threads_connected metric by isolating the threads that are actively processing queries.
AuroraReplicaLag metric is an Aurora-specific option for monitoring lag time for any read replica.
This is different from the generic RDS
ReplicaLag metric because the
AuroraReplicaLag metric tracks the lag in page cache updates from primary to replica rather than the lag in applying all write operations from the primary instance to the replica.
AWS resource metrics
This graph tracks CPU utilization as a percentage. High CPU utilization is not necessarily a bad sign, but the CPUs of your chosen instance type may be the bottleneck if IOPS and network metrics are in normal ranges and there appears to be sufficient memory.
NetworkReceiveThroughput metric tracks network traffic sent from clients. Unlike other RDS database engines, Aurora’s network metric does not include network traffic from the database instances to the storage volumes.
NetworkTransmitThroughput metric tracks network traffic being sent to clients. Unlike other RDS database engines, Aurora’s network metric does not include network traffic from the database instances to the storage volumes.
Monitor Amazon Aurora with Datadog
If you’d like to see this dashboard for your Amazon Aurora metrics, you can try Datadog for free for 14 days. This customizable dashboard will be populated immediately after you set up the Aurora integration.
For a deep dive on Amazon Aurora metrics and how to monitor them, check out our three-part How to Monitor Amazon Aurora series.