Scaling a complex freight platform without blind spots
Uber Freight operates a technology-powered marketplace that connects more than 1,000 shippers with carriers and manages $18B in freight. Reliability is critical to its mission of moving goods efficiently. “With more than 1,000 shippers and $18B in freight under management, we have to provide 99.99% uptime for our customers,” says Thiyagarajan Anandan, Sr. Engineering Manager, Platform Engineering. “If our shipper platform is down for any amount of time, it could delay critical processes like getting goods to a grocery store.”
The company runs a complex hybrid environment that includes Weblogic, Tomcat, and Oracle VM on-premises alongside Azure, GCP, AWS, OCI, and OpenShift in the cloud. Millions of data points flow through the system from more than 100 different sources.
Before Datadog, observability was fragmented. Uber Freight used one tool for infrastructure and APM monitoring, another tool for frontend issues, and a third tool for logs. Engineers had to switch between tools to investigate incidents. “Our existing tool often did not detect incidents before end users reported problems,” says Anandan. “We also could not fully expand monitoring across our entire system, which created coverage gaps.”
“Our existing tool often did not detect incidents before end users reported problems. We also could not fully expand monitoring across our entire system, which created coverage gaps.”
User-based licensing restricted visibility to specific individuals, slowing investigations and concentrating responsibility. As the platform scaled, Uber Freight needed unified observability to eliminate blind spots and move from reactive troubleshooting to proactive reliability.
Unifying observability to lower MTTR and improve uptime
Uber Freight selected Datadog to consolidate infrastructure monitoring, APM, logs, RUM, synthetics, database monitoring, and incident management into one platform. “Datadog gives us end-to-end observability,” says Anandan. “We can capture any degradation in the quality of services before a customer even realizes it.”
With comprehensive metrics and immediate alerts, Uber Freight now benefits from system-reported incidents rather than relying on customer complaints. Engineers can correlate infrastructure, application performance, and frontend signals in one place. “It helps us stop the tool hop,” says Suchitra Vijayakumar, Systems Engineer III. “If a server on-premises has an issue, we can instantly see how that affects the app a customer is using in the cloud.”
By democratizing visibility and correlating signals across the stack, Uber Freight lowered MTTR and MTTD while sustaining 99.99% uptime across its freight marketplace. Datadog adoption moved quickly. Uber Freight transitioned from pilot to production in 40 days. Previously, that process would have taken three months. “That was only possible because of the stellar support and partnership from the Datadog team,” says Anandan.
This represents 55% faster onboarding compared to competitors and enabled Uber Freight to standardize observability rapidly across its hybrid infrastructure.
Using Datadog Bits AI to save 20+ minutes per incident
To further reduce incident response time, Uber Freight recently enabled Datadog Bits AI on its most critical P1 and P2 production monitors. Before introducing Bits AI, troubleshooting was entirely manual. “Before this, troubleshooting was a 100% manual effort,” says Anandan. “When an engineer was paged, they had to start from scratch by digging through logs and metrics just to find a starting point.”
Datadog Bits AI now generates automated investigation reports every time a monitor is triggered. These AI-powered reports provide initial pointers and surface relevant metrics immediately, reducing discovery lag. “Datadog Bits AI provides pre-done investigation reports that have the power to slash our incident understanding time,” says Anandan. “There is huge potential because bits can save 15 to 20 minutes of manual digging per incident.”
“Datadog Bits AI provides pre-done investigation reports that have the power to slash our incident understanding time,” says Anandan. “There is huge potential because bits can save 15 to 20 minutes of manual digging per incident”
The team estimates that Datadog Bits AI can save 20+ minutes per incident by accelerating investigations and reducing manual analysis. As adoption grows, Uber Freight is focused on integrating these AI-generated insights more deeply into daily workflows.
Strengthening proactive reliability for the future
Even with measurable improvements in onboarding speed, incident response, and uptime, Uber Freight sees further opportunity to optimize. The team’s focus is on identifying any remaining blind spots, reducing operational noise, and improving proactive detection across its hybrid environment.
By combining unified observability, expanded visibility across teams, and AI-assisted investigations, Uber Freight continues to strengthen reliability and operational transparency while leading the pace of logistics worldwide. “We know we can use Datadog even more effectively,” says Suchitra Vijayakumar. “It’s a very powerful tool, and we feel like we are not using all the metrics we are currently pumping into it.”