What is MongoDB?
MongoDB is a document-oriented database which stores data as documents, thus offering more flexible structure than other databases. MongoDB is cross-platform and represents its documents in a binary-encoded JSON format called BSON (Binary JSON). This lightweight binary format adds speed to the flexibility of the JSON format, along with more data types.
MongoDB ensures high availability thanks to its replication mechanisms, horizontal scalability allowed by sharding, and is currently the most widely adopted document store. It is used by companies such as Facebook, eBay, Foursquare, Squarespace, Expedia, and Electronic Arts.
MongoDB dashboard overview
MongoDB can be monitored with five key metric categories: throughput, resource saturation, resource utilization, errors (asserts), and database performance. However, the available metrics and their relative importance depend on the storage engine you are using–WiredTiger, which is now the default, or MMAPv1, which was the default in earlier versions of MongoDB.
Below is an example of the customizable out-of-the-box dashboard available in Datadog, which helps you visualize key metrics around your MongoDB cluster’s usage and performance. Even if you’re not a Datadog user, this example can act as a template when implementing your own MongoDB monitoring.
Read on for a graph-by-graph breakdown of the metrics in this MongoDB dashboard, parsed out by metric categories—throughput metrics, resource saturation metrics, resource utilization metrics, and performance/error metrics. Note that the metrics depicted in this sample dashboard are common to both storage engines, with the exception of Average duration of background flushes per replica set (seconds) and Max memory usage, which are unique to MMAPv1
Read requests per second
This graph shows the number of read requests received by your MongoDB cluster per second.
Write requests per second
This graph shows the number of write requests received by your MongoDB cluster per second. Setting alerts on the number of read and write requests can help you prevent resource saturation, spot bottlenecks, and know when to scale up or out.
Number of clients with reads in progress or queued
activeClients.readers metric shows how many active clients have read operations in progress or queued.
Number of clients with writes in progress or queued
activeClients.writers metric shows how many active clients have write operations in progress or queued.
activeClient metrics shows whether or not the number of active clients explains the requests load.
Most used replica sets for reads
A replica set represents multiple servers running MongoDB, each one containing the exact same data. They provide fault tolerance and high data availability. This list shows which replica sets are receiving the most read operations.
Most used replica sets for writes
This list shows which replica sets are receiving the most write operations.
Resource saturation metrics
Queued read and write requests per second
This graph shows the number of read and write requests that are currently queued. If you see this queue continuously growing during heavy read/write traffic, MongoDB is not addressing requests as fast as they are arriving. This indicates that you should increase your deployment capacity.
Replica sets with largest read queues
This list shows which replica sets have the largest queues of read requests.
Replica sets with largest write queues
This list shows which replica sets have the largest queues of write requests.
Resource utilization metrics
Percentage of client connections used
The percentage of connections used continuously monitors resource utilization under low, normal, and peak traffic so you can appropriately set your alerts. If MongoDB runs on low connections, it may not be able to handle incoming requests in a timely manner.
Number of page faults per host
Page faults indicate operations which required the MongoDB to fetch data from the disk because it wasn’t available in active memory (“hard” page fault), or when the operation required in-memory page relocation (“soft” page fault).
It is important to alert on this metric because frequent page faults may indicate that your data set is too large for the allocated memory. Page faults can also be a sign of inefficient schema design, redundant or unnecessary indexes, or anything using available RAM unnecessarily.
Max memory usage (MMAPv1 only)
This widget tracks the
mem.mapped metric, which is only available in the MMAPv1 storage engine. This metric gives you a good approximation of the total size of your database(s).
Memory-mapped files are crucial for performance with the MMAPv1 storage engine since it allows MongoDB to manipulate data files as if they were in memory.
Data size per replica set
This graph monitors the
dataSize metric per replica set, which measures the space taken by all the documents (plus padding with MMAPv1) in the database.
Performance & error metrics
Oplog window per replica set
The oplog (operations log) window represents the interval of time between the oldest and latest entries in the oplog, which usually corresponds to the approximate amount of time available in the primary’s replication oplog. MongoDB advises that users send a warning notification if the oplog window is 25 percent below its usual value during traffic peaks, and a high-priority alert under 50 percent.
Max replication lag per replica set (seconds)
Replication lag represents how far a secondary is behind the primary for a replica set. Ideally, replication lag is equal to zero, which should be the case if you don’t have load issues.
If replication lag gets too high for all secondary nodes, the integrity of your data set might be compromised in case of failover (secondary member taking over as the new primary because the current primary is unavailable).
Number of open cursors per replica set
This graph monitors the number of cursors currently opened by MongoDB, which represent pointers to read query answers.
A gradual increase in the number of open cursors without a corresponding growth of traffic is often symptomatic of poorly indexed queries. It can also be the result of long running queries due to large result sets. You should take a look to see how you could optimize them.
Number of timed out cursors per replica set
This graph monitors the
cursor.timedOut metric, which is incremented when a client connection has died without having gracefully closed the cursor.
By default, MongoDB traps these cursors after 10 minutes of inactivity. You should check if you have a large amount of memory being consumed from non-active cursors. A high number of timed out cursors can be related to application issues.
Average duration of background flushes per replica set (seconds) (MMAPv1 only)
It is important to monitor this metric for the MMAPv1 storage engine. MongoDB “flushes” writes to data files every 60 seconds by default. This parameter can be tuned depending on the workload. If your background flushes take more than this frequency value to execute, it means that MongoDB is continuously flushing to disk, which can have a serious impact on your database performance. Your background flushes should usually take less than 1 second.
Number of asserts per second
Asserts typically represent errors. MongoDB generates a document reporting on the number of each type of assertions that have been raised: message, warning, regular, and user. This graph monitors the total number of asserts per second.
Monitoring MongoDB with the Datadog dashboard
For a deep dive on MongoDB metrics and how to monitor them, check out our three-part How to Monitor MongoDB Performance Metrics series.