Building Highly Reliable Data Pipelines at Datadog
A look at how Datadog builds and operates data pipelines reliably at scale.
This post sketches out our incident response process, where it succeeded and where it stumbled on March 8, and what we learned along the way.
Learn how we tackled a case of high network-latency in our usage estimation platform that required a multi-layered solution.
A deep dive into what happened at the platform level during the outage of March 8, 2023.
Learn how we developed a new scheduling algorithm for data fetching and rendering and how we built it for use across our suite of Datadog products.
A closer look at storage routing in Husky, Datadog's third-generation event storage system.
We’ve recently improved the raw performance of the Datadog Agent, leading to 20% less CPU use on Agents flooded with custom metrics.
Learn about Datadog's repeatable design elements that we've documented in our design style guide called DRUIDS.
Husky is an unbundled, distributed, schemaless, vectorized column store. Here's how we built it—and why.
Employees at all modern software companies use a ton of outside pieces of software to do their jobs. Learn how Datadog's IT team expanded Clarity to automate monitoring these accounts for inactivity and optimizing how much we spend on them.