Machine Learning | Datadog

Monitor Ray applications and clusters with Datadog

Learn how to monitor your AI workloads and their resource consumption as you scale them with Ray.

Monitor Amazon Bedrock with Datadog

Learn how to monitor your foundation models' usage, API performance, error rate, and more with Datadog's ...

Monitoring Amazon SageMaker with Datadog

Learn how Datadog's integration with Amazon SageMaker can help you monitor resource utilization and identify ...

Monitor your NVIDIA GPUs with Datadog

Learn how our NVIDIA DCGM integration provides visibility into all of your NVIDIA GPUs in a single platform.

Monitor machine learning models with Fiddler's offering in the Datadog Marketplace

Learn how to centralize monitoring of your machine learning–based applications, proactively maintain model ...

Understand the scope of user impact with Watchdog Impact Analysis

See how many users are affected by service performance issues so that you can troubleshoot more effectively.

Augmented troubleshooting with Watchdog Insights

Watchdog Insights surfaces clues and helps reduce MTTR—and now supports Log Management.

Automated root cause analysis with Watchdog RCA

Learn how Watchdog can automatically identify the root cause of performance issues across your stack.

Datadog APM gains 3 superpowers: App Analytics, Service Map & Watchdog

With three major new features and support for numerous languages and frameworks, Datadog APM is more powerful ...

Auto-smooth noisy metrics to reveal trends

Datadog's new Auto Smoother function makes it simple to smooth out noisy metrics without losing sight of the ...

Watchdog: Auto-detect performance anomalies without setting alerts

Watchdog uses machine learning to sniff out potential performance problems without any setup or configuration.

Robust Statistical Distances for Machine Learning

Designing powerful outlier and anomaly detection algorithms requires using the right tools. Discover how ...