
Barry Eom

Mohammad Jama

Yun Kim
As large language models (LLMs) grow more powerful, organizations are deploying agentic AI applications to tackle complex, multi-step tasks. With Amazon Bedrock Agents, developers can orchestrate these agents to manage tasks such as triggering serverless functions, calling APIs, accessing knowledge bases, and maintaining contextual conversations—all while breaking down complex user requests or tasks into manageable steps.
We’re excited to announce a new integration between Datadog LLM Observability and Amazon Bedrock Agents that helps you monitor agentic LLM applications built on Amazon Bedrock. LLM Observability monitors these applications for performance, quality, and security issues such as latency spikes, hallucinations, incorrect tool selection, and prompt injection attempts. Beyond tracking the overall health of agentic applications, developers can track step-by-step operations of an agent across complex workflows and monitor foundation model calls, tool invocations, and knowledge base interactions. With full visibility into model behavior and application context, developers can identify, troubleshoot, and resolve issues faster.
In this post, we'll explore how the integration between LLM Observability and Amazon Bedrock Agents helps you:
- Monitor complex agent workflows
- Optimize performance and control costs
- Evaluate output, tool selection, and overall quality
Monitor complex agent workflows
The integration of LLM Observability with Amazon Bedrock Agents offers comprehensive observability for agentic LLM applications that programmatically invoke agents via the InvokeAgent
API. LLM Observability provides visibility into agent workflows by capturing detailed telemetry data from Amazon Bedrock Agents, enabling your teams to monitor, troubleshoot, and optimize their LLM applications more effectively.
With end-to-end tracing, you can visualize each operation of an agent's workflow from pre-processing through post-processing, including orchestration and guardrail evaluations. During debugging, you can use detailed execution insights to quickly pinpoint failure points and understand error contexts.

Optimize performance and control costs
As your teams scale their agentic applications, each agent interaction—whether it's retrieving knowledge, invoking tools, or calling models—can impact latency and cost. Without visibility into how these resources are used, it's difficult to pinpoint inefficiencies or control spend as workflows grow more complex. For applications built on Amazon Bedrock Agents, LLM Observability automatically provides:
- Latency monitoring: Track the time taken for each step and the overall execution to identify bottlenecks.
- Error rate tracking: Observe the frequency and types of errors encountered to improve reliability and debug issues.
- Token usage analysis: Monitor the number of tokens consumed during processing to manage costs.
- Tool invocation details: Gain insights into external API calls made by agents, such as calls to run AWS Lambda functions or knowledge base queries.

Evaluate output, tool selection, and overall quality
In agentic applications, it's not enough to know that a task was completed. You also need to know how well it was completed. For example, are generated summaries accurate and on topic? Are user-facing answers clear, helpful, and free of harmful content? Did an agent select the right tool? Without answers to these questions, silent failures can slip through and undercut intended outcomes—like reducing handoffs to human agents or automating repetitive decisions.
LLM Observability helps your teams assess the quality and safety of their LLM applications by evaluating the inputs and outputs of model calls at the root level and within nested steps of a workflow. Built-in evaluations detect quality, safety, and security issues, including prompt injections, off-topic completions, and toxic content. You can also submit custom evaluations to visualize domain-specific quality metrics, such as whether an output matched expected formats or adhered to policy guidelines. In addition, you can monitor guardrails to inspect when and why content filters were triggered during execution.
These insights appear directly alongside latency, cost, and trace data to help your teams identify how an agent behaved and whether it produced the right result.
Build and monitor your agentic AI applications with confidence
The integration between Datadog LLM Observability and Amazon Bedrock Agents helps you facilitate reliability, performance, cost-effectiveness, and responsible AI use of your agentic LLM applications. To get started, check out the LLM Observability documentation.
Datadog is an AWS Specialization Partner and AWS Marketplace Seller that has been building integrations with AWS services for over a decade, amassing a growing catalog of more than 100 integrations. To learn more about how Datadog integrates with Amazon AI and machine learning services, see Monitor Amazon Bedrock with Datadog and Monitoring Amazon SageMaker with Datadog.
If you don’t already have a Datadog account, you can sign up for a free 14-day trial today.