Machine Learning | Datadog

Best practices for monitoring managed ML platforms

Learn about what to monitor through each step of an ML workflow.

Stay up to date on the latest incidents with Bits AI

Learn how Bits AI can enhance your incident responses with quick summaries and natural language queries.

Monitor Ray applications and clusters with Datadog

Learn how to monitor your AI workloads and their resource consumption as you scale them with Ray.

Monitor Amazon Bedrock with Datadog

Learn how to monitor your foundation models' usage, API performance, error rate, and more with Datadog's ...

10 insights on real-world container use

Our latest report examines more than 2.4 billion containers run by tens of thousands of Datadog customers to ...

Monitoring Amazon SageMaker with Datadog

Learn how Datadog's integration with Amazon SageMaker can help you monitor resource utilization and identify ...

Integration roundup: Monitoring your AI stack

Learn how you can monitor health and performance across every layer of your AI stack with integrations from ...

Monitor your NVIDIA GPUs with Datadog

Learn how our NVIDIA DCGM integration provides visibility into all of your NVIDIA GPUs in a single platform.

Monitor machine learning models with Fiddler's offering in the Datadog Marketplace

Learn how to centralize monitoring of your machine learning–based applications, proactively maintain model ...

Understand the scope of user impact with Watchdog Impact Analysis

See how many users are affected by service performance issues so that you can troubleshoot more effectively.

Augmented troubleshooting with Watchdog Insights

Watchdog Insights surfaces clues and helps reduce MTTR—and now supports Log Management.

Automated root cause analysis with Watchdog RCA

Learn how Watchdog can automatically identify the root cause of performance issues across your stack.

Watchdog detects Kubernetes anomalies and surfaces root causes

Watchdog automatically helps with root cause analysis and detects Kubernetes anomalies.

Watchdog for Infra automatically detects infrastructure anomalies

Watchdog automatically detects anomalies in your infrastructure without any configuration.

Speed up your root cause analysis with Metric Correlations

If you identify a possible issue, Metric Correlations suggests which parts of your system are likely to be ...

Datadog APM gains 3 superpowers: App Analytics, Service Map & Watchdog

With three major new features and support for numerous languages and frameworks, Datadog APM is more powerful ...

Auto-smooth noisy metrics to reveal trends

Datadog's new Auto Smoother function makes it simple to smooth out noisy metrics without losing sight of the ...

Watchdog: Auto-detect performance anomalies without setting alerts

Watchdog uses machine learning to sniff out potential performance problems without any setup or configuration.

Introducing metric forecasts for predictive monitoring in Datadog

Forecasts predict your metrics' future behavior, so you can specify how far in advance you want to get ...

3 scenarios where machine learning makes for smarter alerts

Watch these videos to learn how your apps and infrastructure can benefit from automated, algorithmic ...

Robust Statistical Distances for Machine Learning

Designing powerful outlier and anomaly detection algorithms requires using the right tools. Discover how ...

Introducing new scaled algorithms for improved outlier detection

Our new outlier detection algorithms take magnitude and dispersion into account for better alerting.

Introducing anomaly detection in Datadog

Anomaly detection analyzes recent metric patterns to identify abnormalities.

Introducing outlier detection in Datadog

Datadog's new outlier detection feature allows you to automatically identify any host (or group of hosts) that ...

...
...