How Patronus AI Used Datadog Cloud Security to Monitor Its Security Posture as It Navigated a Strategic Acquisition | Datadog
Case study

Patronus AI scales cost-effectively with unified observability through the Datadog for Startups program

Software Development

30 Employees

San Francisco

About Patronus AI

Patronus AI built a SaaS platform that automates the evaluation, monitoring, and optimization of large language model-based applications in production.

“Datadog is incredibly simple to integrate into your application thanks to its auto-instrumentation features, and it gives you best-in-class observability and monitoring. In a small team, time is the most precious resource, and Datadog helps you save it.”

case-studies/varun-joshi
Varun Joshi
Head of Engineering
Patronus AI
case-studies/varun-joshi

“Datadog is incredibly simple to integrate into your application thanks to its auto-instrumentation features, and it gives you best-in-class observability and monitoring. In a small team, time is the most precious resource, and Datadog helps you save it.”

Varun Joshi
Head of Engineering
Patronus AI
Why Datadog?
  • Provided credits and a defined trial window to test all core products without financial risk
  • Enabled observability without the maintenance overhead and complexity of open source tools
  • Provided single solution for metrics, traces, and logs
Challenge

Patronus AI struggled with fragmented observability tools, manual instrumentation overhead, and excessive error noise. As a startup, the company needed a cost-effective way to manage these operational challenges.

Key Results
Unified observability

Replaced fragmented tools with single platform for metrics, traces, and logs

Faster debugging

Engineers can pivot directly from high-level traces to exact log lines that caused issues

Cost-effective validation

Startup program provided risk-free opportunity to test platform

Foundation for scale

Established robust observability infrastructure to support future growth

Fragmented tools and manual overhead create operational bottlenecks

Patronus AI provides an automated platform for evaluating, monitoring, and optimizing large language model systems and AI agents to ensure they are safe, reliable, and perform effectively in real-world enterprise applications.

As the company scaled its AI platform, it faced several operational challenges that slowed down development and made it difficult to maintain system reliability. The biggest issue was fragmented observability across multiple open source tools. Patronus used one tool for error tracking, another for metrics, and had logs scattered across different systems. This forced engineers to constantly switch between platforms and made it difficult to spot correlations between different system events.

The company also struggled with manual instrumentation overhead. The tool used for tracing required manually instrumenting their codebase, which slowed down feature delivery and led to gaps in visibility and incomplete traces.

Another challenge was low adoption of their internal metrics system. According to Varun Joshi, Head of Engineering, the metrics environment was rarely used by the team, meaning they weren't getting value from their observability investments.

Perhaps most frustrating was the error noise and lack of trace-to-log correlation. The company's previous tool generated too many raw error events without linking them to distributed traces or logs, making troubleshooting and identifying where the problem was coming from difficult. "It was hard to filter for the information we wanted, and while the errors were useful, there were just too many of them," explains Joshi. "Our principal engineer and I were hitting the limit of what we could do with that tool and the kind of information we could get."

Patronus AI needed a single, unified observability platform to replace its patchwork of tools and ad-hoc logging. However, as a startup with budget constraints, it needed to validate the return on investment before committing to a full observability suite.

patronus_ai_team.png

Unified platform and dedicated support provides advantages over open-source tools

To solve these challenges, Patronus AI enrolled in the Datadog for Startups program. The decision was made easier because several of its senior engineers had positive experiences with Datadog at previous companies. "Others had heard that Datadog was easy to set up and genuinely useful, which made getting buy-in much easier compared to other observability tools that tend to get more mixed reviews or are harder to maintain," says Joshi.

The Datadog for Startups program provided credits and a defined trial window that allowed Patronus to stress-test all core products without financial risk, then scale usage confidently. "As a startup, you don't know how your product's going to change, and what you need from an observability tool," explains Joshi. "The startup program is a big part of why we said, 'let's try out Datadog and see if the value it provides us matches what we would pay for it,' because that's a really hard thing to understand."

“As a startup, you don't know how your product's going to change, and what you need from an observability tool... Datadog for Startups gave us the breathing room to evaluate Datadog products without getting bogged down by cost or maintenance concerns.”

Patronus AI replaced fragmented observability tools with Datadog's unified Infrastructure Monitoring, APM, and Log Management, which brought metrics, traces, and logs into a single view, eliminating the need for engineers to jump between different tools and providing complete system visibility. This, along with dedicated support, provided Patronus AI with significant advantages over open source solutions.

To address its manual instrumentation challenges, Datadog APM's auto-instrumentation automatically captures API calls, database queries, and service interactions with zero code changes. This freed up the engineering team to focus on building product features.

Datadog also helped Patronus AI engineers address their challenges around error noise and lack of trace-to-log correlation. Datadog's Log Management combined with Trace Search and Analytics allows engineers to pivot from a high-level trace directly into the exact log lines that triggered an issue, making debugging faster and more efficient.

Startup program enables unified observability without budget constraints

The Datadog for Startups program helped Patronus AI's 12-person engineering team quickly prove the platform's value and dramatically improved their observability capabilities. One of the biggest wins was improved adoption across the team. Datadog's intuitive interface and Slack-integrated alerts mean every new engineer can start using the platform from day one. "Even first week employees know how to make their way around and get value," says Joshi.

Today, monitoring and Slack-based alerts keep their AI-evaluation platform running smoothly. The unified platform has accelerated their time-to-value, reduced operational overhead, and created a foundation for future growth. "Datadog is also easy to maintain as part of your observability stack, so you can spend less time managing tooling and more time building features for your customers," adds Joshi.

“Datadog is also easy to maintain as part of your observability stack, so you can spend less time managing tooling and more time building features for your customers.”

Most importantly, the startup program gave Patronus AI the freedom to explore and validate Datadog's features without worrying about costs. "Datadog for Startups gave us the breathing room to evaluate Datadog products without getting bogged down by cost or maintenance concerns," explains Joshi. "As a startup, we have to be deliberate about where we spend time and energy, so having the flexibility to explore Datadog's features and focus purely on what was useful for our business made a big difference. Without the program, we likely would've been more cautious about trying new features, or just stuck with our existing stack trying to make it work for our needs."

The company is now planning to expand its use of Datadog with Error Tracking, On Call, and potentially other solutions. "Datadog is incredibly simple to integrate into your application thanks to its auto-instrumentation features, and it gives you best-in-class observability and monitoring," says Joshi. "In a small team, time is the most precious resource, and Datadog helps you save it."

Resources

gated-asset/container-costs-ebook-header-1010x600

ebook

5 Proven Ways to Reduce Container Costs: Kubernetes and ECS
blog/bits-ai-dev-agent/bits_ai_dev_hero

guide

Bits AI Dev Agent Product Brief
blog/datadog-gpu-monitoring/gpu_monitoring_hero

guide

GPU Monitoring Product Brief