Machine Learning | Datadog

Monitor Ray applications and clusters with Datadog

Learn how to monitor your AI workloads and their resource consumption as you scale them with Ray.

Monitor Amazon Bedrock with Datadog

Learn how to monitor your foundation models' usage, API performance, error rate, and more with Datadog's ...

10 insights on real-world container use

Our latest report examines more than 2.4 billion containers run by tens of thousands of Datadog customers to ...

Monitoring Amazon SageMaker with Datadog

Learn how Datadog's integration with Amazon SageMaker can help you monitor resource utilization and identify ...

Integration roundup: Monitoring your AI stack

Learn how you can monitor health and performance across every layer of your AI stack with integrations from ...

Monitor your NVIDIA GPUs with Datadog

Learn how our NVIDIA DCGM integration provides visibility into all of your NVIDIA GPUs in a single platform.

Monitor machine learning models with Fiddler's offering in the Datadog Marketplace

Learn how to centralize monitoring of your machine learning–based applications, proactively maintain model ...

Understand the scope of user impact with Watchdog Impact Analysis

See how many users are affected by service performance issues so that you can troubleshoot more effectively.

Augmented troubleshooting with Watchdog Insights

Watchdog Insights surfaces clues and helps reduce MTTR—and now supports Log Management.

Automated root cause analysis with Watchdog RCA

Learn how Watchdog can automatically identify the root cause of performance issues across your stack.

Watchdog detects Kubernetes anomalies and surfaces root causes

Watchdog automatically helps with root cause analysis and detects Kubernetes anomalies.

Watchdog for Infra automatically detects infrastructure anomalies

Watchdog automatically detects anomalies in your infrastructure without any configuration.