Twine Security strengthens reliability and trust through Datadog to improve user experience

Building AI Digital Employees for enterprise IAM

Twine Security is building AI Digital Employees designed to operate as fully autonomous members of cybersecurity teams. Instead of offering recommendations alone, Twine’s agents execute real cybersecurity tasks from start to finish inside enterprise environments. “Twine builds AI Digital Employees who help cybersecurity teams close the execution gap,” says Ben Ofer, Head of Marketing at Twine Security. “Enterprises have invested heavily in security tools, yet they still struggle to operationalize them effectively. Our vision is a world of AI Digital Employees, each an expert in a specific cybersecurity domain, working together to solve complex security challenges.”

Twine’s first digital employee, Alex, focuses on IAM. Alex automates tickets, manages provisioning and deprovisioning, enforces MFA policies, conducts user access reviews, and generates compliance documentation. The system integrates with HR platforms, ticketing systems, identity providers, and cloud infrastructure, and already manages millions of identities across dozens of enterprise customers.

For Yuval Carmel, Head of Engineering at Twine Security, delivering that level of autonomy requires deep transparency into how agents reason and act. “We are not building a chatbot,” Carmel explains. “We are building an autonomous IAM team member. If that agent is making decisions about identity and access, we must understand how it reasons, how it uses tools, and how much it costs per task.”

The visibility gap in multi agent systems

As Twine scaled its multi-agent architecture, abstraction layers began hiding critical information. The team uses structured output frameworks such as PydanticAI to enforce schemas and validation rules. While powerful, these frameworks modify prompts and automatically retry failed outputs behind the scenes. “Our biggest struggle was the black box of logic,” Carmel says. “We could see the user request and the final output, but not the exact prompt sent to the model or the retries happening internally.”

“Our biggest struggle was the black box of logic. We could see the user request and the final output, but not the exact prompt sent to the model or the retries happening internally.”
Yuval Carmel
Head of Engineering, Twine Security

This lack of visibility created operational risk. In some cases, the agent returned a valid structured response that passed validation but contained deficient reasoning. In others, hidden retry loops increased latency and multiplied token usage without producing obvious errors. Twine could see its model spend increasing but could not clearly attribute costs to specific agents or workflows.

Because Alex operates inside sensitive IAM environments, understanding why an action was taken is critical. Twine needed full traceability into agent decisions to reinforce its guardrails and ensure every autonomous action could be reviewed and audited. As Alex handled more identities and integrations, that visibility became essential to scaling autonomy with confidence.

Transforming agent reasoning into operational control

Datadog Agent Observability transformed how Twine operates its agents. By instrumenting its system with LLM tracing, Twine gained complete visibility into every request. Engineers can now inspect the fully rendered prompt, each retry attempt, token usage, latency, and every tool call inside a single correlated trace. “Datadog Agent Observability gives us complete visibility into our agents’ reasoning,” Carmel says. “We stopped guessing. We can see the prompt, the retries, the tool calls, and the cost of every step.”

“Datadog Agent Observability gives us complete visibility into our agents' reasoning. We stopped guessing. We can see the prompt, the retries, the tool calls, and the cost of every step.”
Yuval Carmel
Head of Engineering, Twine Security

That visibility quickly surfaced inefficiencies. When Twine noticed a spike in LLM spend, Datadog traces revealed that the Data Retrieval Agent was repeatedly calling the same tools to fetch overlapping information. The agent was executing its intended function, but it was looping inefficiently and inflating token usage. “Once we saw the redundant tool loop in the trace, the fix was straightforward,” Carmel explains.

The team tightened orchestration logic and restricted redundant calls. Over a three-month period, token usage per task decreased by 40% after the team optimized tool calls, streamlined long and heavy prompts, and eliminated redundant tool loops altogether. The result was not only a measurable reduction in LLM spend, but also improved response times and more predictable agent behavior.

Datadog also revealed prompt bloat caused by injected schema definitions that were invisible in application code. By analyzing fully rendered prompts directly in traces, Twine optimized prompt structure and reduced token consumption without sacrificing quality. The team now tracks cost per resolution instead of only total API spend, enabling more precise financial control as adoption grows.

Reducing MTTR by 80% and scaling autonomy with confidence

The operational impact has been significant. Twine reduced mean time to resolution (MTTR) by approximately 80%. Debugging reasoning issues that previously required hours of log analysis and manual reproduction now takes minutes through trace inspection.

This improvement was especially clear during a production incident known internally as the Timeout Spike. Latency began increasing on a core IAM workflow. Instead of attributing the issue to model instability, engineers opened high latency traces in Datadog and used Prompt Tracking to link the affected LLM calls to the exact prompt version, revealing that a recent prompt modification had introduced a reasoning loop. Because traces correlate directly with deployments, Twine identified the responsible pull request and shipped a fix in under 30 minutes, preventing broader customer impact.

Agent Observability has reshaped Twine’s engineering workflows from reactive debugging to trace-driven development. Every pull request that modifies agent logic must include Datadog trace links demonstrating expected reasoning behavior. Reviewers audit how the agent reasons, which tools it selects, how many steps it takes, and how many tokens it consumes, not just whether the code compiles.

Twine also builds evaluations directly from real production traces using Experiments. When an agent encounters an issue or edge case, that trace becomes a regression test and part of a growing evaluation dataset. Using Datadog traces together with Claude Code and the Datadog MCP, Twine implemented hundreds of evaluations and regression tests derived from real production behavior. With Agent Observability, these production insights feed pre-production testing and validation, helping Twine across every stage of the AI agent development lifecycle to improve behavior and performance. This allows the team to systematically prevent repeat failures and validate improvements before release.

This trace-driven validation has also accelerated delivery. Because engineers can validate trace behavior earlier in the development lifecycle, Datadog eliminates a full deployment cycle for roughly 30% of changes, resulting in an overall deployment velocity increase of approximately 15%. Instead of waiting for post-deployment feedback to catch reasoning flaws, Twine verifies agent behavior at the pull request stage and deploys with greater confidence.

As Twine expands beyond IAM into additional cybersecurity domains, observability is becoming a strategic control layer for agent autonomy. The team is increasingly focused on metrics such as steps to resolution, tool success rates, and cost per successful task to ensure agents operate efficiently and predictably at scale. “Our agents are becoming more autonomous every quarter, and as they take on more responsibility, observability becomes non-negotiable,” says Nadav Erez, Co-Founder and CTO, Twine Security. “Datadog gives us the foundation to scale intelligent cybersecurity employees with accountability. We are not just shipping AI faster. We are shipping AI we can trust.”