Threshold-based alerts are extremely effective at detecting issues in your infrastructure and applications. But many user-driven metrics display gradual baseline shifts or patterns of recurring fluctuations (e.g., higher on weekdays vs. weekends). For metrics like these, it is difficult to set static thresholds or rate-of-change alerts that catch unexpected behavior, without also triggering frequent false alarms.
That’s where algorithmic monitoring comes in. Datadog’s outlier detection and anomaly detection use sophisticated machine learning functionality to automatically identify abnormal values, based on analyses of group behavior or past performance. Let’s explore a few use cases that illustrate the benefits of algorithmic monitoring.
One of the most useful applications for anomaly detection is to help uncover abnormalities in your user traffic, based on historical patterns. This effectively means that an anomaly alert can detect an unusual dip during peak business hours (e.g., Thursday afternoon)—even if that value would be normal on a weekend.
Anomaly detection is also designed to help you identify abnormalities in critical business metrics (logins/signups, traffic, checkouts) that exhibit recurring, user-driven fluctuations. Even if a metric is trending in a specific direction, anomaly detection will automatically adjust its predicted range of values in response to the metric’s shifting baseline—but still identify abnormalities.
Outlier detection helps you identify deviations from normal group behavior. This is particularly useful for any cluster of nodes that shares work, such as web servers, load-balanced microservices, or nodes in a distributed database such as Cassandra. Applying outlier detection to a pool of Cassandra nodes can help you automatically ensure that the database is properly distributing work across the cluster.
Although anomaly detection and outlier detection provide different views into your infrastructure and applications, they can complement each other to deliver more fine-grained insights. For example, you can apply anomaly detection to the aggregated count of requests processed by a pool of web servers, and outlier detection across individual web servers, to make sure that the load is balanced properly.
In this post, we’ve covered just a few of the many ways that algorithmic monitoring can automatically identify anomalies and outliers in your metrics. If you’d like to start building smarter alerts for your infrastructure and applications, here’s a 14-day, full-featured trial.