VLLM Observability & Monitoring | Datadog

vLLM Observability & Monitoring

Gain comprehensive visibility into the performance and resource usage of your LLM workloads.

dg/vllmheader

A unified monitoring platform provides full visibility into the health and performance of each layer of your environment at a glance. Datadog allows you to customize this insight to your stack by collecting and correlating data from more than 1,000 vendor-backed technologies, all in a single pane of glass. Easily monitor your underlying infrastructure, supporting services, applications alongside security data in one centralized monitoring platform.

 


Next-generation ML Monitoring

Monitor and your entire machine learning stack with Datadog.

watchdog-apm-illustration.png

AWS Trainium & Inferentia

Monitor and optimize deep learning workloads running on AWS AI chips

tracesearch-apm-illustrationv2.png

OpenAI

Monitor token consumption, API performance, and more.

servicemap-apm-illustration.png

NVIDIA DCGM Exporter

Gather metrics from NVIDIA’s discrete GPUs, essential to parallel computing.

전 세계 기업들이 신뢰하고 선택한 Datadog

ML Monitoring Resources

Learn about how Datadog can help you monitor your entire AI stack.

Datadog AI Monitoring Starter Kit