What is DynamoDB?
Amazon DynamoDB is a fully managed NoSQL database cloud service, part of the AWS portfolio. Fast and easily scalable, it is meant to serve applications which require very low latency, even when dealing with large amounts of data. It supports both document and key-value store models, and has properties of both a database and a distributed hash table.
DynamoDB overview dashboard
When building a dashboard to monitor DynamoDB, you should consider including key metrics pertaining to latency, errors, consumed read and write capacity, and throttled requests, which occur when you exceed your provisioned capacity. Below is a snapshot of the customizable DynamoDB dashboard in Datadog, which automatically populates with metrics from Amazon CloudWatch. Even if you are not a Datadog user, the dashboard’s contents should provide a template for assembling a comprehensive view of DynamoDB activity and performance.
Here’s a widget-by-widget breakdown of the graphs and query values in this dashboard.
This query-value widget displays the average response time (in milliseconds) of successful requests over the past hour. Abnormally high latency levels are cause for concern, but note that this metric only covers the execution time of successful requests. So requests that succeed only after a retry will appear to be in normal latency ranges, despite taking longer to complete.
When a client request triggers at least one read or write event that exceeds your provisioned throughput, the request is throttled by DynamoDB. (See this blog post for more details on throttling of events and requests.) Because throttled requests can execute slowly (after a retry) or fail altogether, throttling is usually a worrisome occurrence. If you set an alert before your capacity is exhausted, you may be able to provision more read or write capacity and avoid throttled requests. This query-value widget displays the maximum value reported by DynamoDB over the past hour.
Throttled reads/writes per table (Past hour)
These toplists (one for reads and one for writes) displays the tables that have seen the most throttled events recently, ranked by the maximum value over the past hour. Throttling occurs when the requested number of read or write events for a table exceeds the provisioned capacity of that table.
Percent of provisioned read/write consumed (Past hour)
These timeseries graphs (one for reads and one for writes) show how much of the provisioned capacity for each table has been consumed over the past hour. When a table’s throughput is not fully used, DynamoDB saves a portion of this unused capacity for eventual future “bursts” of read or write throughput. Therefore, read or write capacity can briefly spike above 100 percent due to bursting.
Most throttled tables (Past week)
This toplist displays the tables with the highest peak rates of throttled requests over the past week. This longer-term view is useful for identifying capacity bottlenecks on individual tables.
Throttling (Past week)
This bar graphs displays the total number of throttled requests over the past week across all tables. Each individual bar is broken down into color-coded slices, with each slice representing an individual table. This graph is useful for identifying when throttling events have occurred, and whether the throttling was largely confined to one particular table or spread across multiple tables.
Throttling (Past day)
This bar graphs displays the total number of throttled requests over the past week across all tables. Each individual bar is broken down into color-coded slices, with each slice representing an individual table. Like the weeklong view, this graph is useful for identifying when throttling events have occurred, and whether the throttling was largely confined to one particular table or spread across multiple tables.
System Errors (Past day)
This bar graph breaks down the number of system errors over the past day, summed by table. System errors represent requests that resulted in a HTTP 500 (server error) code. Under normal circumstances this metric should be equal to zero.
User Errors (Past day)
This bar graph breaks down the number of user errors over the past day, summed by table. User errors represent requests that resulted in a HTTP 400 (client error) code, such as a request with an authentication failure. AWS provides a list of DynamoDB client errors—the count of UserErrors is incremented for all of them except ProvisionedThroughputExceededException, ThrottlingException, and ConditionalCheckFailedException. If your client application is interacting correctly with DynamoDB, this metric should always be equal to zero.
Failed conditional write attempts (Past day)
This bar graph breaks down the number of failed conditional writes over the past day, summed by table. With DynamoDB, you can define a logical condition for a write request that defines whether or not the item can be modified: e.g. the item can be updated only if it’s not marked as “protected”. If this logical condition returns “false”, this metric is incremented and a 400 error (Bad request) is returned. This error type is counted separately from UserErrors (above).
Request latency per table - Top 10 (Past 4h)
This timeseries charts the average response time (in milliseconds) of successful requests over the past four hours. Each line in the graph represents a different table; the graph displays metrics only for the 10 tables with the highest latency in that time period. This graph can help you identify at a glance whether requests for any particular table are executing slowly.
Recent DynamoDB Events
In addition to the key metrics graphed on this dashboard, it is also important to keep an eye on the discrete occurrences, or events, that relate to your DynamoDB database. This section of the dashboard lists the most recent events that match the search term “dynamodb,” which include Amazon notifications, such as changes to a table’s provisioned throughput, as well as Datadog alert notifications that mention DynamoDB.
See your metrics in the DynamoDB dashboard
If you’d like to see your DynamoDB metrics and events on this dashboard, you can try Datadog for free for 14 days. The dashboard will populate with metrics automatically after you set up the integration with Amazon CloudWatch.
For a deep dive on DynamoDB metrics and how to monitor them, check out our three-part How to Monitor DynamoDB series.