
Barry Eom

Nicole Cybul
As organizations rapidly scale their use of large language models (LLMs), many teams are adopting LiteLLM to simplify access to a diverse set of LLM providers and models. LiteLLM provides a unified interface through both an SDK and proxy to speed up development, centralize control, and optimize LLM-powered workflows. But introducing a proxy layer adds abstraction, making it harder to understand how requests are processed. This challenge is particularly prominent when it comes to understanding model selection, performance, and cost attribution.
We are pleased to announce Datadog LLM Observability’s native integration with LiteLLM, which enables you to easily monitor, troubleshoot, and optimize your LiteLLM-powered applications. With this integration, Datadog customers can:
- Instantly and automatically capture every LLM request, whether made via LiteLLM SDK or proxy
- Identify bottlenecks and issues by tracing end-to-end requests of your LLM stack using LiteLLM
- Monitor the performance of your agents or applications and LiteLLM itself
In this post, we’ll show you how Datadog LLM Observability’s LiteLLM integration helps engineering and AI teams gain actionable insights and accelerate troubleshooting across their LLM-powered applications.
Track LiteLLM usage patterns
Getting started with Datadog’s LiteLLM integration is simple and requires just a few steps. Users need to enable LLM Observability in their Datadog account, install or upgrade to dd-trace-py
version 3.9.0 or later, and activate auto-instrumentation in their Python environment. Full setup instructions are available in the Datadog documentation.
Once enabled, Datadog’s native LiteLLM integration allows you to easily track how your team or organization uses LLMs across different models, providers, and teams. Datadog automatically traces all LLM requests and enriches them with key metadata such as token counts, estimated cost, and the base URLs of the proxies that handled those requests. By doing so, Datadog allows organizations to analyze and understand which LLMs and providers are used most frequently by specific teams or applications. For instance, you can use contextual fields like user and team aliases to break down usage patterns and ensure fair allocation of resources.

You can also track token usage and estimate costs for each request, supporting internal chargebacks and budgeting. For example, if your organization uses multiple providers like OpenAI and Anthropic, you can compare usage across teams or applications to ensure consistent cost management. You may discover that a specific service is disproportionately relying on high-cost models for low-priority tasks, an insight you can use to shift those requests to more cost-effective alternatives or enforce token limits.
Additionally, this integration makes it possible to gain insights into load balancing behavior. Organizations can analyze how LiteLLM distributes requests across multiple deployments and determine whether it's employing a latency-based, least-busy, or usage-aware routing strategy.
This granular visibility empowers you to make data-driven decisions about model selection, provider usage, and resource allocation across your LLM-powered workflows.
Troubleshoot LiteLLM agents and applications faster with end-to-end tracing
Datadog’s integration with LiteLLM enables you to quickly pinpoint and resolve issues across your LLM stack using rich, end-to-end tracing.

From initiation to response, each LiteLLM request is traced as a span that captures the full lifecycle of the interaction. These traces include metadata that enables performance monitoring so teams can identify latency spikes and bottlenecks, whether they stem from LiteLLM’s routing logic, delays from the underlying model provider, or issues with specific endpoints.
Datadog also surfaces errors, retries, and fallback events, making it easier to detect recurring failure patterns and investigate retry logic and timeouts. You can inspect the content and structure of each request and response—including the prompt, user roles, and any parameters—which helps debug issues like unexpected outputs, prompt formatting mistakes, or unusual model behavior. This telemetry data can be correlated across your stack using shared trace and call IDs, along with contextual fields like user or team aliases and cache status.
The traces collected by Datadog LLM observability empower teams to proactively monitor, troubleshoot, and optimize their LLM-powered applications with granularity and speed.
Start observing LiteLLM-powered AI agents and applications today
As LLM-powered applications become more central to business operations, having unified, actionable observability is critical.
With Datadog’s native LiteLLM integration, you can confidently monitor, troubleshoot, and optimize every LLM request no matter how complex your stack.
Instrument LiteLLM with Datadog today and unlock end-to-end visibility for your AI-driven applications.
For more information, check out the dd-trace-py v3.9.0 release notes and our LLM Observability documentation.
If you are an existing Datadog customer, you can start monitoring your serverless application today. Otherwise, sign up for a 14-day free trial.