Protect agentic AI applications with Datadog AI Guard

Océane Bordeau

Santiago Mola

Vijay George

Product Manager

Alexa Levine

Senior Product Marketing Manager

Organizations are increasingly using agentic AI applications powered by LLMs to automate analysis, decision-making, and operational workflows. As these AI agents take on more responsibility, they gain access to internal tools and services and can interact with them in unintended ways. The ability of agents to read files, invoke API calls through the Model Context Protocol (MCP), and modify infrastructure introduces new opportunities for misuse, data exposure, and prompt-driven manipulation that traditional security tools weren’t designed to detect.

Datadog AI Guard, now in Limited Availability, adds a real-time security layer for agentic AI applications. AI Guard evaluates prompts, responses, and tool calls with contextual awareness and an LLM-as-a-judge model to determine whether an action aligns with organizational intent and policy. When a request appears harmful or misaligned, AI Guard can block it before it can reach critical systems or sensitive data.

In this post, we’ll explore how AI Guard can help you:

Detect unprotected agents to identify risk exposure
Protect AI agents from runtime threats
Monitor tool calls and agentic workflows with context-aware detection
Automate threat response and understand your AI security posture

We’ll also explain how you can enable AI Guard without adding new infrastructure.

Detect unprotected agents to identify risk exposure

To improve security across their environment, teams need visibility into every exposed agent and the underlying services it depends on, including any unsanctioned tools and endpoints they may not know are in use. AI Guard maps unprotected agents with full lineage, so teams can identify the model endpoints, data sources, and infrastructure dependencies each agent accesses. This context also helps teams understand ownership and assess their environment's attack surface.

The following screenshot shows several unprotected agents, including one that connects to an unsanctioned DeepSeek endpoint. This agent accesses sensitive data stored in Amazon S3 buckets and interacts with other resources that contain identity misconfigurations and API vulnerabilities. By surfacing these relationships, AI Guard helps teams investigate how exposed agents interact with underlying infrastructure and where risk is concentrated.

AI Guard discovery showing unprotected agents that DeepSeek has access to

After identifying unprotected agents, you can deploy AI Guard inline to block attacks in real time. This approach helps you apply controls for your agents both at runtime and at every layer in the underlying infrastructure.

Protect AI agents from runtime threats

AI agents need consistent policies because threats such as prompt injection and sensitive data leakage often emerge from subtle manipulations in inputs or outputs. AI Guard helps teams address these risks and others in the OWASP Top 10 risks for LLMs by evaluating interactions at runtime and blocking actions that appear unsafe or out of scope.

AI Guard currently includes two core protection capabilities, Prompt Protection and Tool Protection, and we are developing additional runtime protection capabilities that will expand AI Guard’s coverage in future releases. Prompt Protection uses an evaluator model to judge whether a prompt or response should proceed based on context, policy, and the agent’s intended goal. It applies multilayered runtime protection, including prompt injection filtering, input and output checks, and multilingual analysis. These evaluations use full context from historical messages, system prompts, and tool calls to help teams detect direct and indirect attacks.

Because the evaluator model reviews every interaction, AI Guard helps teams reduce accidental leakage of internal information, block harmful requests, and maintain consistent boundaries without placing a heavy burden on developers. Each decision is accompanied by reasons, which appear in the form of tags such as data-exfiltration, jailbreak, and indirect-prompt-injection, that explain why AI Guard allowed or blocked an action.

The following screenshot shows an example of a user prompt that is flagged as unsafe because AI Guard detected data exfiltration.

A user prompt that is flagged as unsafe for attempting to obtain payment information about another user.

Monitor tool calls and agentic workflows with context-aware detection

For tool-using AI agents, security risks extend beyond the model’s reasoning process and into the actions that the agents take. An attacker can manipulate these actions across several steps, moving from routine requests into probing behavior that eventually leads to escalation and exfiltration.

With Tool Protection, AI Guard analyzes every tool call and evaluates its intent, arguments, relationship to earlier steps in the conversation, and output. The evaluation process considers the full chain of activity—from system prompt to user messages to previous actions—and examines whether a tool invocation aligns with the agent’s assigned goal. In the following screenshot, which shows a continuation of our previous example of data exfiltration, the tool call is deemed unsafe.

AI Guard tool call evaluation showing flagged unsafe behavior.

For another example, if an AI agent attempts to repurpose a benign file helper tool into a command that deletes core directories, AI Guard will flag and block the request. The system can also detect data exfiltration attempts that use parameters in curl or other network-capable tools. This context-first approach helps teams catch multistep attacks, such as those described in the lethal trifecta, where subtle prompt manipulation cascades into harmful actions.

AI Guard allows benign intermediate steps to proceed as long as they align with the agent’s goal and policy. AI Guard stops the chain of behavior as soon as intent diverges, helping agents stay productive without exposing the environment to unnecessary risk.

AI Guard also monitors the content flowing through prompts, tool calls, and model responses for sensitive data and secrets. When sensitive data scanning is enabled for a service, AI Guard scans messages for personally identifiable information (PII) such as email addresses, phone numbers, and Social Security numbers, as well as secrets. This helps you identify unintended data exposure, even when individual prompts and tool calls appear benign.

Automate threat response and understand your AI security posture

AI Guard helps you manage the rollout of AI security controls with flexible enforcement modes. Organizations can begin in monitor-only mode to observe agent behavior, tune policies, and review false positives. When teams are confident in their policies, they can enable blocking of steps that are flagged as unsafe. Teams can also add specific tool configurations by service and environment to block individual tools even if the tool call is deemed safe.

AI Guard settings page showing how to configure policies

AI Guard generates security signals that surface detected threats directly in the AI Guard Signals Explorer. Each signal includes the attack context, details about blocked or allowed actions, detected attack categories, and links to related AI Guard spans for deeper investigation. You can also configure notifications when signals trigger.

Datadog provides out-of-the-box detection rules for common AI threat patterns, so you can start receiving signals as soon as AI Guard is enabled. For example, the Data exfiltration successful rule generates a high- or critical-severity signal when an attacker successfully manipulates an agent to leak sensitive data, such as PII or credentials. You can also create custom detection rules by combining AI Guard attributes to target the threat patterns most relevant to your environment, then define thresholds, severity levels, and response actions for each one.

The following screenshot shows a high-severity signal generated after AI Guard detected indirect prompt injection attempts targeting the expense-processor service in production. The signal surfaces the attacker's IP address, the targeted user, and a timeline of safe, unsafe, and blocked evaluations surrounding the event. The related spans show the full sequence of LLM and tool calls that AI Guard analyzed, including detected attack categories and sensitive data identified during the interaction.

AI Guard signal showing an indirect prompt injection attempt on a service handling PII

AI Guard also includes a dedicated interface for analyzing AI activity across services. The Explorer view classifies each interaction as safe, unsafe, or blocked and lets teams filter by service, environment, threat type, and assessment verdict. This functionality helps you identify attack patterns, review blocked events, and validate policy effectiveness. Trend widgets provide visibility into assessment outcomes and evaluation latency over time, and the trace side panel includes detailed explanations for each decision.

AI Guard Signals Explorer showing all recent user inputs

Additionally, the AI Guard Playground enables you to test how the policies work with different user inputs. The output identifies the decision by AI Guard and lists the detected violation categories, such as jailbreak attempts, instruction override, and data exfiltration, with corresponding confidence probabilities.

AI Guard Playground showing a test prompt on the AI's response

AI Guard also lets you create monitors that analyze AI Guard spans so that you can receive notifications about blocked attacks and unsafe prompts.

Enable AI Guard without adding new infrastructure

AI Guard uses Datadog’s existing instrumentation foundation, so you can activate protections without deploying new gateways or introducing new architectural components. In-app instrumentation is available through the Datadog Agent and the Datadog Tracer (dd-trace).

If your services already use the Agent and dd-trace for Datadog Application Performance Monitoring (APM), you can enable AI Guard by updating your tracer configuration. Evaluator results generate AI Guard traces integrated with APM traces, giving you visibility into prompts, responses, and tool call decisions alongside the telemetry data that you already use for troubleshooting. When LLM Observability is enabled, these AI Guard traces are automatically linked to the corresponding LLM Observability trace, letting you jump directly to it for deeper investigation. AI Guard supports Python, Ruby, JavaScript, and Java in addition to offering a flexible integration for custom setups.

Start improving your visibility and control of agentic AI

AI Guard provides a real-time security layer for AI-driven applications and agents. By evaluating prompts, responses, and tool calls in context, it helps teams reduce the risk of prompt injection, data leakage, and harmful tool usage while maintaining flexibility for developers to innovate with AI capabilities.

Sign up to learn more about AI Guard and the additional protection capabilities that we’re building for it. For setup instructions, read the AI Guard documentation.

If you’re new to Datadog, you can sign up for a 14-day free trial to get started.

Get Started with Datadog

Protect agentic AI applications with Datadog AI Guard

Detect unprotected agents to identify risk exposure

Protect AI agents from runtime threats

Monitor tool calls and agentic workflows with context-aware detection

Automate threat response and understand your AI security posture

Enable AI Guard without adding new infrastructure

Start improving your visibility and control of agentic AI

Start monitoring your metrics in minutes

Detect unprotected agents to identify risk exposure

Protect AI agents from runtime threats

Monitor tool calls and agentic workflows with context-aware detection

Automate threat response and understand your AI security posture

Enable AI Guard without adding new infrastructure

Start improving your visibility and control of agentic AI

Related jobs at Datadog

We're always looking for talented people to collaborate with

Start monitoring your metrics in minutes