AssemblyAI scales production Voice AI with Datadog's unified observability | Datadog
AssemblyAI scales production Voice AI with Datadog's unified observability

Case Study

AssemblyAI scales production Voice AI with Datadog's unified observability

About AssemblyAI

AssemblyAI provides industry-leading speech-to-text models and Voice AI infrastructure developers use to build real-time voice agents and conversation intelligence apps at scale.

Voice AI
~75 Employees
NYC
“Building production AI at scale requires deep visibility, and Datadog gives us the insight to operate with confidence.”
case-studies/assemblyai/headshot-ben-gotthold
“Building production AI at scale requires deep visibility, and Datadog gives us the insight to operate with confidence.”
Ben Gotthold Staff Software Engineer AssemblyAI

なぜDatadogなのか?

  • Unifies metrics, logs, and traces across AI inference pipelines and multi-cloud GPU infrastructure
  • Provides deep visibility into model performance, customer-perceived latency, and system reliability
  • Enables real-time dashboards, anomaly detection, and alerting
  • Helps teams ship new models faster and detect issues earlier
  • Supports confident scaling of AI workloads

Challenge

As AssemblyAI rapidly scaled AI inference across thousands of GPUs, the team needed deep visibility into latency, reliability, and cost efficiency without slowing model innovation or increasing operational complexity.

Key Results

2X throughput

With 50% lower infrastructure costs

40% reduction in MTTR

Enabling faster incident resolution

50% reduction

In investigation and postmortem time

~750K/year

In avoided engineering costs

Innovating AI Without Compromising Latency or Reliability

AssemblyAI is the leading platform for building Voice AI applications, delivering state-of-the-art speech-to-text and speech understanding APIs. Thousands of developers and enterprises rely on AssemblyAI to power meeting notetakers, contact center analytics, and voice agents that automate business workflows while delivering exceptional customer experiences.

As an applied research company, AssemblyAI builds its own in-house models and ships improvements continuously. Teams deploy dozens of updates every week while processing more than 30 million hours of audio per month and up to 2 million in just one day across a multi-cloud environment powered by thousands of GPUs. Reliability, latency, and cost efficiency are core product requirements.

AssemblyAI made an early decision to invest in observability as part of its platform foundation. The team began with the Datadog for Startups program, which made it easy to get up and running while the business scaled quickly. The program enabled AssemblyAI to engage with Datadog during its early growth stages and ultimately evolve that relationship into a long-term partnership.

“When you are building AI at this scale, visibility is a prerequisite,” says Ben Gotthold, Staff Software Engineer at AssemblyAI. “We made observability part of how we build from the very beginning.”

As AssemblyAI’s customer base grew, so did the complexity of its AI infrastructure. The platform runs large-scale inference pipelines across multiple cloud providers using GPUs and TPUs. Model performance, infrastructure efficiency, and customer experience are tightly linked.

Latency is a core product metric for AssemblyAI. Even small regressions in inference time or system behavior can directly affect customer outcomes and operating costs. “We care deeply about milliseconds because that is what our customers feel,” Gotthold explains. “If latency drifts, the product experience degrades.”

To continue shipping new models at a high pace, AssemblyAI needed a clear, consistent way to understand performance across every stage of its inference pipeline and respond quickly when something changed.

AssemblyAI Team

How AssemblyAI builds AI at scale

AssemblyAI designed its AI platform with deep instrumentation from day one. Engineers use Datadog Custom Metrics to track every stage of the inference pipeline, including customer-perceived latency, internal processing time, and GPU utilization. These metrics give the team a clear understanding of where performance and cost tradeoffs exist as workloads scale. “At our scale, observability is what lets us operate thousands of GPUs efficiently,” says Gotthold.

To support this at scale, AssemblyAI relies on Datadog Infrastructure Monitoring to maintain real-time visibility across its multi-cloud GPU fleet. This helps ensure deployments remain reliable and cost-effective as traffic grows month over month.

When issues arise, engineers turn to Datadog Log Management to investigate and resolve problems quickly. Logs are correlated with metrics so teams can move from detection to root cause without manually stitching together data across systems. Customer support teams use the same logs to debug issues and help customers faster.

Gotthold shares,“Datadog gives us the visibility we need to understand how our models behave in production and keep improving them.”

Shipping fast with confidence

AssemblyAI ships continuously and uses staged deployments to manage risk. Datadog Synthetic Monitoring is used to validate critical API paths and customer workflows, helping teams catch regressions early and confirm releases behave as expected in production-like environments. “We move fast by design, and clear signals let us take that speed into production safely,” Gotthold explains.

As the organization scaled, AssemblyAI also adopted Datadog Workflow Automation to standardize incident response. Automated workflows help route alerts and trigger consistent actions, reducing manual coordination and keeping response times low.

AssemblyAI Team

Performance that powers growth

As AssemblyAI scaled its AI platform, Datadog’s tooling helped the team operate more efficiently and respond faster.

“Instead of spending time maintaining monitoring systems, we spend that time improving our models,” Gotthold explains. “That focus has been critical to our growth.”

AssemblyAI continues to expand its AI platform and push the boundaries of Voice AI. The observability foundation the team built early allows them to scale traffic, deploy new models, and adopt new infrastructure with confidence.

“Datadog is a core part of how we operate at scale,” says Gotthold. “It gives us the visibility we need to keep building and scaling AI without slowing down.”

リソース