Datadog takes CouchDB monitoring beyond ad hoc command line checks and manual inspection of the runtime statistics report. You can see graphs of real-time and historical metrics on request traffic, latency, and read/write workload from all your CouchDB servers in one place.
You can break down CouchDB traffic in terms of HTTP status codes and request methods. Though not explicitly for database service, those metrics can provide insights on database activity and host health. For example you can set alerts on too many “forbidden” responses (403) or if the number of internal server errors (500) suddenly increases.
If you are seeing a high number of delete requests, you may consider optimizing the application logic, as document deletion is the kind of operation that’s better and more effectively handled with a bulk update.
CouchDB supports distributed scaling over multiple host clusters. To minimize the amount of communication between each host, incremental replication is commonly used to propagate document changes. During the process, CouchDB automatically detects conflicts but leaves the resolution up to the application. Thus, as in a version control system, all changes, even delete operations, are diligently recorded. As a result, the database can swell in size with old document revisions.
One way to get around the problem is to compact the database from time to time. But since this is a heavy disk I/O operation, when is it reasonable to run a compaction?
Monitoring CouchDB with Datadog can help you find the answer, by tracking database size in real time. By multiplying the current document count by the estimated average size of a document designed for that database, you can get a sense of how big the database should be. If the disk size is much greater than the expected size of documents stored in the database, it’s very likely that the disk is overpopulated by the fragments of past revisions, and it may be a good time to compact the database.
Another important aspect of CouchDB that Datadog monitors is the authentication cache. CouchDB stores a fair amount of user credentials in memory to speed up the authentication process. Monitoring usage of the authentication cache can alert you for possible attempts to gain unauthorized access.
If CouchDB reports a high number of
auth_cache_misses, then either the cache is undersized to service the volume of legitimate user requests, or a brute force password/username attack is taking place. Correlating the
httpd_status_codes.401 (unauthorized request) metrics helps shed light on which scenario it might be.
If you’re already a Datadog customer, you can easily start monitoring CouchDB in minutes here. CouchDB is one of more than 500 technologies that Datadog integrates with, so you can monitor your entire infrastructure in one place, from servers and containers to load balancers, caches, and custom applications. If you’re new to Datadog, you can sign up here for a full-featured free trial.