Datadog LLM Observability | Datadog

LLM Observability

Ship AI agents faster,
with confidence

Monitor, evaluate and improve your agents in one place

Error Tracking Demo

LLM Observability supports leading models, frameworks, and agent frameworks.

openai.svg
anthropic.svg
gemini.svg
vercel.svg
googlevertex.svg
langchain.svg
crewai.svg
pydantic.svg
awsbedrock.svg
litellm.svg
strandsagents.svg

Improve agent quality with every release

Understand how your agents behave in production

Trace every request across prompts, model responses, retrieval steps, and tool calls to understand how your AI system executes. Investigate failures, latency spikes, and unexpected costs by examining each step of an agent workflow.

  • Trace prompts, retrieval steps, tool calls, and agent decisions
  • Track latency, token usage, retries, and errors across each step
  • Identify bottlenecks and failures within complex agent workflows

Debug production issues with full execution context

monitor behavior

Continuously evaluate AI system quality

Measure how your AI system performs using automated evaluations and human feedback. Detect hallucinations, unsafe outputs, prompt injection attempts, and sensitive data exposure before they impact users.

  • Run out-of-the-box and custom evaluators aligned with your KPIs
  • Use annotations and human review to label and evaluate outputs
  • Detect hallucinations, prompt injection, and PII exposure

Monitor quality trends and identify drift across releases

evaluate quality

Ship changes with evidence

Test changes to prompts, models, tools, and agent logic using real production data. Build datasets from traces, run experiments, and incorporate human feedback to continuously improve system behavior before deploying updates.

  • Build versioned datasets from production traces
  • Run experiments to compare prompts, models, and configurations

Use real interactions to iterate and improve system behavior

Iterate Fast

Connect agents to the rest of your stack

Bring AI workloads into Datadog so you can correlate agent behavior with backend services, infra, and real user sessions. Debug faster with one platform.

  • Tie LLM workload performance to service and infrastructure signals
  • Link response time and quality to real user sessions
  • Eliminate context switching with tracing, experiments, and evaluations together

unify context

Get started in minutes

Instrumentation

Copied!
1
claude mcp add --transport http datadog-onboarding-us1 "https://mcp.datadoghq.com/api/unstable/mcp-server/mcp?toolsets=onboarding" && claude /mcp
Copied!
1
Add Datadog LLM Observability to my project
Copied!
1
Add Datadog LLM Observability to my project
Copied!
1
npm install dd-trace
Copied!
1
2
3
4
5
DD_SITE=<SITE> \
DD_LLMOBS_ENABLED=1 \
DD_LLMOBS_ML_APP=<APPLICATION_NAME> \
DD_API_KEY=<API_KEY> \
NODE_OPTIONS="--import dd-trace/initialize.mjs" <your application command>
Copied!
1
pip install ddtrace
Copied!
1
2
3
4
5
DD_SITE=<SITE> \
DD_LLMOBS_ENABLED=1 \
DD_LLMOBS_ML_APP=<APPLICATION_NAME> \
DD_API_KEY=<API_KEY>  \
ddtrace-run <your application command>
Copied!
1
wget -O dd-java-agent.jar 'https://dtdg.co/latest-java-tracer'
Copied!
1
2
3
4
5
6
java -javaagent:/path/to/dd-java-agent.jar \
-Ddd.site=<SITE> \
-Ddd.llmobs.enabled=true \
-Ddd.llmobs.ml.app=<APPLICATION_NAME> \
-Ddd.api.key=<API_KEY> \
-jar path/to/your/app.jar
Instrument your application via OpenTelemetry or the HTTP API

Instrumentation

Copied!
1
claude mcp add --transport http datadog-onboarding-us1 "https://mcp.datadoghq.com/api/unstable/mcp-server/mcp?toolsets=onboarding" && claude /mcp
Copied!
1
Add Datadog LLM Observability to my project
Copied!
1
Add Datadog LLM Observability to my project
Copied!
1
npm install dd-trace
Copied!
1
2
3
4
5
DD_SITE=<SITE> \
DD_LLMOBS_ENABLED=1 \
DD_LLMOBS_ML_APP=<APPLICATION_NAME> \
DD_API_KEY=<API_KEY> \
NODE_OPTIONS="--import dd-trace/initialize.mjs" <your application command>
Copied!
1
pip install ddtrace
Copied!
1
2
3
4
5
DD_SITE=<SITE> \
DD_LLMOBS_ENABLED=1 \
DD_LLMOBS_ML_APP=<APPLICATION_NAME> \
DD_API_KEY=<API_KEY>  \
ddtrace-run <your application command>
Copied!
1
wget -O dd-java-agent.jar 'https://dtdg.co/latest-java-tracer'
Copied!
1
2
3
4
5
6
java -javaagent:/path/to/dd-java-agent.jar \
-Ddd.site=<SITE> \
-Ddd.llmobs.enabled=true \
-Ddd.llmobs.ml.app=<APPLICATION_NAME> \
-Ddd.api.key=<API_KEY> \
-jar path/to/your/app.jar
Instrument your application via OpenTelemetry or the HTTP API
Customer Stories

Fast-growing teams ship production-ready AI with Datadog

Twine Security logo
Cybersecurity

Datadog LLM Observability gives us complete visibility into our agents' reasoning so we can reduce cost, improve reliability, and ship with confidence.

Read the customer story
Fintool logo
Financial Services

By using Datadog LLM Observability, we’ve improved response accuracy and reduced latency, ensuring faster, more reliable insights for our customers.

Read the customer story
Appfolio logo
Real Estate

Datadog LLM Observability helped us ensure high model performance and quality, and allowed us to expand functionality quickly and safely.

Read the customer story
Pricing

Priced for startups, built for enterprise

Scale production-grade AI agents with enterprise-grade controls — for free.

Only pay based on the volume of LLM spans, even as context grows, reasoning gets heavier, and agents use more tools.

Free For individuals and small teams

$0

per month

Key Features

  • Icon/check Created with Sketch. Up to 40K LLM spans
  • Icon/check Created with Sketch. 15 day retention
  • Icon/check Created with Sketch. Unlimited context and evals
  • Icon/check Created with Sketch. Full feature access
Pro For teams running in production

$240

per month

Key Features

  • Icon/check Created with Sketch. Starting at 100K LLM spans
  • Icon/check Created with Sketch. 15 day retention
  • Icon/check Created with Sketch. Unlimited context and evals
  • Icon/check Created with Sketch. Full feature access
Note: Pricing effective starting on May 1, 2026. Pricing varies by region.
Free trials started prior to May 1 can convert to free tier on May 1.
FAQ

Frequently Asked Questions

Which plan is right for me?

Free: Best for individuals and smaller teams getting started with tracing, experiments, evals, and prompt iteration.

Pro: Best for teams running in production and includes additional LLM span volume, predictable on demand pricing, and retention add ons.

What is an LLM span?

Each call to an LLM provider like OpenAI or Anthropic is captured as its own LLM span. One agent run can create multiple LLM spans inside the same workflow.

LLM Observability meters and bills only on the count of LLM spans. Tool spans, embedding and retrieval spans, and agent spans are not billed.

That means pricing scales better as your context window grows and your agent becomes more complex and autonomous.

How can I get started?

Instrument your LLM application with an SDK for Python or Node, or submit spans through an API if your application is written in another language.

If you use orchestration frameworks like LangChain and popular LLM providers, auto instrumentation can capture traces for you.

View instrumentation docs

What do evals cost?

There is no separate product fee for offline or online evaluations. Every plan includes the full evaluation workflow.

If an eval run makes LLM calls, those calls count as LLM spans. You are not charged a separate eval fee on top of that.

How long is LLM Observability data retained?

On-demand Free and Pro plans have a 15-day trace, span and experiment data retention. When committing to M2M or Annual contracts, trace and span data are retained for 15 days. Experiment results are retained for 90 days.

Retention add-ons extend traces and spans to 30, 60, or 90 days and extend experiments to 6, 9, or 12 months.

Datasets have a separate 3-year retention and are versioned so you can rerun experiments against the same baseline and compare results over time.

Is SDS part of LLM Observability?

Yes. LLM Observability scans, identifies, and redacts sensitive information in your LLM application by using Sensitive Data Scanner.

This helps prevent data leakage across personal information, financial data, health records, and other sensitive content that needs protection.

You do not need to buy SDS separately for LLM Observability. For every 10K LLM requests, you get an allocation of 1 GB of Sensitive Data Scanner.

How does billing work?

Free includes up to 40K LLM spans per month. Pro starts at $240 per month and includes 100K LLM spans.

Additional on demand usage is billed after the first 100K LLM spans. Retention add ons are billed per 10K LLM spans. M2M and annual commitments are discounted.

Monitor, evaluate, and improve your agents in one place