Dust Powers Reliable AI Agent Creation with Datadog's Advanced Observability | Datadog
Dust Powers Reliable AI Agent Creation with Datadog's Advanced Observability

Case Study

Dust Powers Reliable AI Agent Creation with Datadog's Advanced Observability

About Dust

Founded in 2023, Dust is a platform that enables companies to create custom AI agents by combining leading AI models (GPT-4, Claude, Mistral, etc.) with their enterprise data (Slack, Notion, Google Drive, GitHub, etc.). Dust serves more than 1,000 enterprise customers, including Clay, Doctolib, and Photoroom.

Artificial Intelligence
72 employees
Paris, France
“It's crucial for us to monitor error levels specific to our server interactions. Datadog meets this need perfectly and allows us to understand them dynamically despite a very high traffic volume.”
case-studies/dust/headshot-stanislas-polu
“It's crucial for us to monitor error levels specific to our server interactions. Datadog meets this need perfectly and allows us to understand them dynamically despite a very high traffic volume.”
Stanislas Polu Co-founder & CTO Dust

なぜDatadogなのか?

High-performance advanced log ingestion, management and analysis capabilities

Challenge

Achieving full observability from day one to ensure a stable infrastructure for thousands of teams around the world

Key Results

Unified Cloud Monitoring

Ensures a stable infrastructure for thousands of users worldwide

Fast Processing of Large Log Volumes

Advanced ingestion, management, and analytics features essential for development workflows

Accelerated Problem Resolution

Better team coordination, contextual awareness, and dynamic understanding of error levels

Pursuing Full Observability to Keep a Stable Platform

Hosted on GCP cloud environments in Europe and the United States, Dust’s platform complies with current regulatory requirements. It uses standard Google technologies such as Kubernetes clusters, Managed SQL, Managed Redis, and Google Cloud Storage. To provide the right data to its AI agents, Dust relies on semantic search, a technology that understands the context and intent behind user queries.

To ensure a stable infrastructure for the thousands of teams using the platform—about 10,000 monthly active users—Dust invested in full observability from the very beginning.

Dust Team

“At our previous company, Stripe, we experienced the benefits of moving from Splunk to Datadog as part of a massive observability effort to build the most stable platform possible. When we created Dust, Datadog naturally became the obvious choice. We were especially impressed by the performance of its advanced log access, management, and analytics capabilities.”

The Advanced Observability Needed for Generative AI Models

Datadog’s features stand apart from the less intuitive tools offered by cloud providers. Beyond monitoring and optimization, Datadog enables fast ingestion and querying of massive log volumes—capabilities Dust relies on heavily in its development process.

Dust uses Datadog Infrastructure Monitoring, which provides metrics, visualizations, and alerts that help the R&D team maintain, optimize, and secure their cloud environment. A user-friendly interface and detailed security insights support effective team communication and faster problem-solving.

“When issues arise, On-Call instantly aligns the team with the right context for faster resolution, better incident control, and better collaboration. Critical information and data are easy to access within a single platform, eliminating the need to switch environments.”

Using large language models creates long-running server interactions because the models generate tokens—units of text used to encode information for efficient processing by generative AI. This process increases the need for advanced observability: server calls and responses are often streamed with long-lived open connections. This creates significant resource consumption challenges, where Datadog enables constant monitoring of instances. Additionally, the nature of language model interactions means error rates are typically higher than in traditional SaaS applications.

From Anomaly Detection to Infrastructure Control

Another key characteristic of Dust is the heavy work involved in retrieving enterprise-specific context and indexing data from platforms like Slack, Notion, or GitHub. This results in near-real time processing of large volumes of customer data. Datadog monitoring is essential here as well—this ingestion pipeline is complex and error-prone, especially when credentials are revoked or a service API misbehaves.

“Zero error isn't possible for us, so precise monitoring is essential to understand whether an error rate is nominal or an indication of a real issue.”

Most of Dust’s services are powered by Datadog metrics, with alerts that flag when certain instances need to scale up—ensuring proper infrastructure control. While Dust doesn’t host the AI models it uses, it does monitor their resource consumption through Datadog, performing anomaly detection via token counts—the unit that determines AI cost.

When investigating an issue with a user request, Dust uses Datadog APM, which provides full execution tracing and correlation with infrastructure events or logs.

In addition to seamlessly integrating with GCP, Datadog also supports Dust’s multi-region cloud strategy by making it simple to assign dashboards and monitors by region. This enables highly effective, fully transparent global monitoring. Datadog’s ecosystem of libraries and tools for deployment and integration is extremely mature—an important asset for Dust as they rapidly build and enrich their platform. The next Datadog products under evaluation at Dust will focus on security.

“Datadog is the single best partner to simplify visibility and control over a global infrastructure without having to switch between tools.”

リソース

case-studies/forbes

case study

Forbes adopts Datadog to enable observability across its entire technology organization
case-studies/resources_sncf_casestudy@2x

case study

How Datadog empowered different business units by giving them a single source of truth on their path to modernization
case-studies/resources_toyota_casestudy

case study

Toyota accelerates feature delivery, troubleshooting, and onboarding at scale by monitoring its AWS environment with Datadog