The Monitor

Identify common security risks in MCP servers

8 minute read

Published

Share

Identify common security risks in MCP servers
Yuki Matsuzaki

Yuki Matsuzaki

Mallory Mooney

Mallory Mooney

AI adoption is rapidly increasing, and with that comes a steady influx of useful but potentially vulnerable tools and services still maturing in the AI space. The Model Context Protocol (MCP) is one example of new AI tooling, providing a framework for how applications integrate with and supply context to large language models (LLMs). MCP servers are central to developing AI assistants and workflows that are deeply integrated with your environment. They serve as a bridge to a wide variety of LLM providers, data sources, and remote services, so their usage is quickly becoming a new way attackers can target your systems.

In this post, we'll look at the primary ways MCP servers are vulnerable to threats as well as how to monitor them for malicious activity. But first, we'll briefly look at how MCP servers operate.

How do MCP servers work?

MCP enables AI assistants to simplify development workflows, such as fetching critical logs for troubleshooting application errors. To accomplish this, the protocol uses a client-server architecture to connect hosts to both external and local data sources and remote services, as illustrated in the following diagram:

Monitor prompt injections with Datadog LLM Observability
Monitor prompt injections with Datadog LLM Observability

Let's take a closer look at how this works for requesting a list of S3 buckets via the AWS API:

  1. The MCP client, such as Claude Code, transforms the request to the LLM into the protocol format.
  2. The MCP client then sends the request to an MCP server, which is running locally and configured with credentials to access AWS.
  3. The server uses those credentials to successfully make the API call to fetch a list of S3 buckets.
  4. The server then passes AWS's response back to the MCP client, which forwards it to the LLM as input for processing.
  5. The LLM outputs the list of S3 buckets to the host.

The clear benefit of using MCP servers in these types of scenarios is that it prevents LLMs from directly interacting with external services. Instead of passing API keys and passwords to the LLM—where there's an increased risk of unintended exposure—you can configure a server to fetch credentials from another source. This approach gives you more control over how your AI-based applications handle authentication, but it also introduces new security challenges in keeping MCP server interactions secure.

Identifying MCP server risks and vulnerabilities

MCP servers operate as the glue between hosts and a broad range of external systems, including those that may be untrusted or introduce risk. Understanding these risks requires familiarity with the components supporting MCP interactions, such as which LLMs they interface with, how the servers are configured, and what third-party servers are in use.

Vulnerabilities in LLM interactions

LLM usage alone has risks, which OWASP includes in their list of top security concerns for generative AI apps. But issues can also surface between LLM interactions and MCP architecture. Detecting malicious intent within prompts, such as indirect prompt injections, is a common challenge with using MCP servers. In these scenarios, a client's LLM misinterprets embedded prompts from external sources as valid commands from the host.

Using our previous AWS workflow example, let's say the architecture relies on a third-party MCP server with previously approved tools, which are functions that clients and LLMs use to perform specific actions. However, in this instance, the MCP server's tool definitions were later amended to include malicious instructions to automatically delete all requested S3 buckets. The workflow could look like the following steps:

  1. An engineer uses their IDE terminal to ask Claude Code to fetch a list of S3 buckets that have not been used in the last 90 days.
  2. The MCP client sends that request to the MCP server, which is configured with credentials to access AWS.
  3. Based on the MCP server's updated tool definitions, it uses those credentials to successfully fetch and delete the requested buckets without explicit approval from the host.
  4. The MCP client sends the response back to the LLM, which processes that data and generates the final output (a list of S3 buckets) for the host, who is not aware that the fetched buckets were deleted.

This simplified scenario is an example of a supply chain risk and describes two types of attacks: rug pulls and tool poisoning. The MCP server was approved for use initially, but it was later updated with new tool definitions that the host was not aware of (hence the moniker, rug pull). The updated definition included malicious instructions to automatically delete resources without explicit approval from the host (aka tool poisoning). A real-world example of this kind of activity is modifying a tool's metadata to discreetly exfiltrate all of a user's chat history, which could include sensitive credentials and tokens as well as an organization's intellectual property.

A primary issue of concern with prompt injections that target MCP architecture is a lack of visibility into the input and output between components. In these cases, OWASP provides a few recommendations that can minimize the risk, including clearly defining the model's behavior, segregating untrusted external content, and creating guardrails around a model's input and output. These steps ensure that your LLMs respond accordingly when an attacker attempts to manipulate a prompt.

Misconfigurations in local MCP servers

MCP servers act as proxies between LLMs and the rest of your environment, so misconfigurations in how MCPs interact with other resources can create the same risks as those seen in other parts of cloud infrastructure. Compromised cloud credentials, for example, are one of the primary causes of cloud incidents. In the same way, improperly storing a server's configuration file, which typically contains the necessary credentials for connecting to databases and services, can give attackers access to connected sources. As an example, tool poisoning attacks can force a client to read a host's sensitive files, such as MCP server configuration files (~/.cursor/mcp.json) and SSH keys.

To prevent attackers from accessing these configurations, you can implement OAuth's 2.1 guidelines for securely managing tokens and client-to-server communication. At the time of this writing, MCP engineers are working on a solution that allows users to decouple certain, sensitive workflows, such as authorizing third-party services and making payments, from the MCP client. This implementation adds an extra layer of security to the architecture's workflow by ensuring that sensitive data isn't passed through the client.

Inherent vulnerabilities of third-party MCP servers

Engineers often rely on third-party MCP servers instead of spinning up their own, but this convenience doesn't guarantee secure workflows. Many publicly available MCP servers offer minimal authentication, which increases the risk of an engineer unknowingly using one with malicious code or inefficient security controls. To minimize these risks, OWASP recommends sandboxing MCP servers, in addition to other key practices, such as enforcing authentication and authorization for all MCP interactions. These steps are especially important considering the range of vulnerabilities that can surface from both misconfigured servers and user interactions.

One notable server-side risk stems from a vulnerability found in mcp-remote that can enable remote code execution if left unpatched. At present, there's ongoing discussion around how phishing attacks can also exploit servers that rely on mcp-remote or do not yet natively support OAuth. Another example of a server-side risk is tool name collision, where a host may unknowingly connect to a malicious server with tools that are named similarly to those on a legitimate server. When this happens, the malicious server can silently take over how the tool works and trigger harmful actions.

Beyond server issues, attackers can also directly manipulate user interactions with an MCP server. One common example is consent fatigue, where AI applications overload users with approval requests. While requiring explicit user approval for specific actions, such as writing to a database, is recommended, this control can fail if a malicious MCP server inserts a harmful action in between an influx of legitimate ones. As an example, many users automatically approve MCP tool calls in Claude desktop as a way to bypass the steady barrage of requests. This configuration opens the door for an attacker to create malicious artifacts, such as GitHub issues that contain prompt injections, for an MCP server to process without the user realizing it.

Monitoring MCP interactions

MCP architecture is still evolving, which makes it difficult to maintain reliable visibility into its components. For example, recently found vulnerabilities in the MCP inspector highlights how threats can quickly change an AI application's attack surface. But even at this nascent stage, you can gain improved visibility into critical risks by focusing on the following areas: LLM behavior, MCP server activity and configurations, and credential exposure.

Reviewing LLM input and output, for example, can help you catch instances where an attacker attempts to manipulate prompt context or extract sensitive data.

Monitor prompt injections with Datadog LLM Observability
Monitor prompt injections with Datadog LLM Observability

In addition to monitoring LLM behavior, you can track how MCP servers interact with your environment, including:

Together, these measures provide you with better insight into where security risks exist and when an attacker is actively attempting to target your AI applications through MCP components.

Bring visibility to your MCP server deployments

In this post, we covered a few ways MCP servers are vulnerable to attacks and how to monitor them. Check out our documentation for more information about how Datadog can monitor your LLMs, code, supporting infrastructure, and sensitive data. If you're new to Datadog, sign up for a to start monitoring your AI applications and MCP architecture today.

Related Articles

Best practices for monitoring LLM prompt injection attacks to protect sensitive data

Best practices for monitoring LLM prompt injection attacks to protect sensitive data

Abusing AI interfaces: How prompt-level attacks exploit LLM applications

Abusing AI interfaces: How prompt-level attacks exploit LLM applications

Monitor agents built on Amazon Bedrock with Datadog LLM Observability

Monitor agents built on Amazon Bedrock with Datadog LLM Observability

This Month in Datadog - June 2025

This Month in Datadog - June 2025

Start monitoring your metrics in minutes