The Monitor

Protect agentic AI applications with Datadog AI Guard

6 minute read

Published

Share

Protect agentic AI applications with Datadog AI Guard
Océane Bordeau

Océane Bordeau

Santiago Mola

Santiago Mola

Vijay George

Vijay George

Organizations are increasingly using agentic AI applications powered by large language models (LLMs) to automate analysis, decision-making, and operational workflows. As these AI agents take on more responsibility, they gain access to internal tools and services and can interact with them in unintended ways. The ability of agents to read files, invoke API calls through the Model Context Protocol (MCP), and modify infrastructure introduces new opportunities for misuse, data exposure, and prompt-driven manipulation that traditional security tools weren’t designed to detect.

Datadog AI Guard, now available in Preview, adds a real-time security layer for agentic AI applications. AI Guard evaluates prompts, responses, and tool calls with contextual awareness and an LLM-as-a-judge model to determine whether an action aligns with organizational intent and policy. When a request appears harmful or misaligned, AI Guard can block it before it reaches critical systems or sensitive data.

In this post, we’ll explore how AI Guard can help you:

We’ll also explain how you can enable AI Guard without adding new infrastructure.

Protect AI agents from real-time threats with intelligent guardrails

AI agents need consistent and intelligent guardrails because threats such as prompt injection and sensitive data leakage often emerge from subtle manipulations in inputs or outputs. AI Guard helps teams address these risks and others in the OWASP top 10 risks for LLMs by evaluating interactions at runtime and blocking actions that appear unsafe or out of scope.

AI Guard currently includes two core protection capabilities, Prompt Protection and Tool Protection, and we’re developing additional runtime protection capabilities that will expand AI Guard’s coverage in future releases. Prompt Protection uses an evaluator model to judge whether a prompt or response should proceed based on context, policy, and the agent’s intended goal. It applies multilayered guardrails, including prompt injection filtering, input and output checks, and multilingual analysis. These evaluations use full context from historical messages, system prompts, and tool calls to help teams detect direct and indirect attacks.

Because the evaluator model reviews every interaction, AI Guard helps teams reduce accidental leakage of internal information, block harmful requests, and maintain consistent boundaries without placing a heavy burden on developers. Each decision is accompanied by reasons, which appear in the form of tags such as data-exfiltration, jailbreak, and indirect-prompt-injection, that explain why AI Guard allowed or blocked an action.

The following screenshot shows an example of a user prompt that is flagged as unsafe because AI Guard detected data exfiltration.

A user prompt that is flagged as unsafe for attempting to obtain payment information about another user.
A user prompt that is flagged as unsafe for attempting to obtain payment information about another user.

Monitor tool calls and agentic workflows with context-aware detection

For tool-using AI agents, security risks extend beyond the model’s reasoning process and into the actions that the agents take. An attacker can manipulate these actions across several steps, moving from routine requests into probing behavior that eventually leads to escalation and exfiltration.

With Tool Protection, AI Guard analyzes every tool call and evaluates its intent, arguments, relationship to earlier steps in the conversation, and output. The evaluation process considers the full chain of activity—from system prompt to user messages to previous actions—and examines whether a tool invocation aligns with the agent’s assigned goal. In the following screenshot, which shows a continuation of our previous example of data exfiltration, the tool call is deemed unsafe.

AI Guard tool call evaluation showing flagged unsafe behavior.
AI Guard tool call evaluation showing flagged unsafe behavior.

For another example, if an AI agent attempts to repurpose a benign file helper tool into a command that deletes core directories, AI Guard will flag and block the request. The system can also detect data exfiltration attempts that use parameters in curl or other network-capable tools. This context-first approach helps teams catch multistep attacks, such as those described in the lethal trifecta, where subtle prompt manipulation cascades into harmful actions.

AI Guard allows benign intermediate steps to proceed as long as they align with the agent’s goal and policy. AI Guard stops the chain of behavior as soon as intent diverges, helping agents stay productive without exposing the environment to unnecessary risk.

Automate threat response and understand your AI security posture

AI Guard helps you manage the rollout of AI security controls with flexible enforcement modes. Organizations can begin in monitor-only mode to observe agent behavior, tune policies, and review false positives. When teams are confident in their policies, they can enable blocking of steps that are flagged as unsafe. Teams can also add specific tool configurations by service and environment to block individual tools even if the tool call is deemed safe.

Screenshot of configuration settings for AI Guard, including a toggle option to enable blocking by service and environment.
Screenshot of configuration settings for AI Guard, including a toggle option to enable blocking by service and environment.

AI Guard also includes a dedicated interface for analyzing AI activity across services. The Explorer view classifies each interaction as safe, unsafe, or blocked and lets teams filter by service, environment, threat type, and assessment verdict. This functionality helps you identify attack patterns, review blocked events, and validate policy effectiveness. Trend widgets provide visibility into assessment outcomes and evaluation latency over time, and the trace side panel includes detailed explanations for each decision.

Screenshot of the Explorer view in AI Guard. The view includes charts that show assessment verdicts, blocked events, and span duration, in addition to information about each input.
Screenshot of the Explorer view in AI Guard. The view includes charts that show assessment verdicts, blocked events, and span duration, in addition to information about each input.

Additionally, the AI Guard Playground offers you the opportunity to test how the guardrails work with different user inputs. The output identifies the decision by AI Guard and lists the detected violation categories, such as jailbreak attempts, instruction override, and data exfiltration, with corresponding confidence probabilities.

Screenshot of the AI Guard Playground and its output of tags and tag probabilities in response to user inputs.
Screenshot of the AI Guard Playground and its output of tags and tag probabilities in response to user inputs.

AI Guard also lets you create monitors that analyze AI Guard spans so that you can receive notifications about blocked attacks and unsafe prompts.

Enable AI Guard without adding new infrastructure

AI Guard uses Datadog’s existing instrumentation foundation, so you can activate protections without deploying new gateways or introducing new architectural components. In-app instrumentation is available through the Datadog Agent and the Datadog Tracer (dd-trace).

If your services already use the Agent and dd-trace for Datadog Application Performance Monitoring (APM), you can enable AI Guard by updating your tracer configuration. Evaluator results generate AI Guard traces integrated with APM traces, giving you visibility into prompts, responses, and tool call decisions alongside the telemetry data that you already use for troubleshooting. When LLM Observability is enabled, these AI Guard traces are automatically linked to the corresponding LLM Observability trace, letting you jump directly to it for deeper investigation. AI Guard supports Python, Ruby, JavaScript, and Java in addition to offering a flexible integration for custom setups.

Start improving your visibility and control of agentic AI

AI Guard provides a real-time security layer for AI-driven applications and agents. By evaluating prompts, responses, and tool calls in context, it helps teams reduce the risk of prompt injection, data leakage, and harmful tool usage while maintaining flexibility for developers to innovate with AI capabilities.

To learn more about AI Guard and the additional protection capabilities that we’re building for it, join the Preview. For an overview and setup instructions, check out the AI Guard documentation.

If you’re new to Datadog, you can to get started.

Related Articles

MCP security risks: How to build SIEM detection rules

MCP security risks: How to build SIEM detection rules

Abusing AI interfaces: How prompt-level attacks exploit LLM applications

Abusing AI interfaces: How prompt-level attacks exploit LLM applications

Abusing supply chains: How poisoned models, data, and third-party libraries compromise AI systems

Abusing supply chains: How poisoned models, data, and third-party libraries compromise AI systems

Abusing AI infrastructure: How mismanaged credentials and resources expose LLM applications

Abusing AI infrastructure: How mismanaged credentials and resources expose LLM applications

Start monitoring your metrics in minutes