The foundation of modern media
Arc XP is the content platform and operating system built to power growth for ambitious media companies. Developed by The Washington Post, Arc XP is trusted by leading media organizations worldwide, including The Irish Times, Libération, L’Express, Madsack, Graham Media Group, and Sky News. More than 2,500 websites across 30 countries run on Arc XP, serving over 1.5 billion unique visitors every month. For the publishers, journalists, and editorial teams who depend on it, the platform has to perform consistently and reliably at all times.
“We power some of the most recognized media brands in the world,” says Joe Croney, CTO, Arc XP. “Our platform is the backbone of how our customers publish, monetize, and connect with their audiences. The reliability of that backbone directly impacts their ability to operate and grow their businesses.”
Built on AWS, the platform had scaled to support a global, always-on environment spanning cloud infrastructure, applications, and identity systems. As the platform grew, so did the operational complexity of keeping it running efficiently. To maintain its pace of innovation within business operating requirements, Arc XP needed a more unified operational view. Investigation workflows and incident response were distributed across multiple systems, making it harder to correlate infrastructure signals, application behavior, and service impact in one place. Teams had strong data, but needed better shared context and broader proactive alerting to diagnose issues faster across an increasingly intricate architecture.
Even while consistently meeting its contractual three-nines SLA commitments, that complexity carried a real operating cost. Engineering time was being consumed by the overhead of managing multiple tools, and limited visibility made root cause analysis more time-intensive, leaving less time for roadmap execution and the innovation that would drive the business forward.
“We have a continuous improvement mindset at Arc XP,” says Croney. To move forward, the team needed a single, centralized view across their entire stack.
From growing complexity to unified incident management
To support the complexity of its growing platform, Arc XP standardized its observability approach with Datadog. With Infrastructure Monitoring, APM, Logs, Synthetic Monitoring, and Incident Management all in one place, every team could work from a shared source of operational data with full incident context available without switching tools.
“Datadog connected our observability, our workflows, and our on-call management, so we could turn every incident into an opportunity to build a stronger platform.”
The impact on incident management was immediate. Angelica Marinho, Site Reliability Engineer, who built and operationalized the Incident Management initiative, explains that the team previously had to pull together information from several tools at the start of an investigation. “When information lives in multiple places, it slows down the process and makes it harder to maintain shared context,” she says. Now, a single workflow connects alerting, investigation, and postmortems in one place, greatly streamlining incident response. When an alert fires, a Slack channel is synced to the incident timeline, notebooks capture the investigation in real time, and postmortems are generated directly from the incident record. “The notes, the timeline, and the postmortem are all captured automatically and accessible to the entire team,” says Marinho.
Beyond consolidating tooling, the team established a more disciplined and consistent incident workflow, giving responders shared context, clearer ownership, and a faster path from alert to action.
For Jason Taylor, Head of Cybersecurity, the key to driving meaningful improvement was building the right foundation of visibility first. By leveraging Datadog to better understand incident patterns, underlying causes, and service-impacting events, the team could move from reactive response to more informed operational decision-making. The ability to trace requests across Arc XP’s 20-plus services meant teams could move from symptom to root cause with greater confidence and address issues earlier in the lifecycle.
Standardized cross-platform dashboards gave every incident commander a consistent view of platform health across all components, and a structured on-call model with dedicated incident commanders ensured the right people were paged at the right time. “In a less standardized process, you can end up pulling in more people than necessary,” says Taylor. “With a disciplined model, you can quickly identify the right two or three people to engage.”
Engineers could stay focused on high-value work, while the incident process engaged the right expertise more efficiently. With that clarity in place, the team could identify systemic issues more effectively, not just what went wrong in a single component, but where the platform as a whole would benefit from continued investment. Those insights were elevated to leadership and translated directly into roadmap impact, creating a continuous loop of visibility, learning, and improvement.
“We reduced our customer-impacting incident volume by 86% year-over-year. That result gave our engineering team the room to do what they do best: build and innovate.”
Jason Taylor
Head of Cybersecurity
Raising the bar: from reliability to innovation
The investment in visibility and process paid off in measurable ways. Arc XP reduced customer-impacting incident volume by 86% year-over-year, with complex investigations that had previously consumed hours of multi-tool work now resolved in 15 to 20 minutes. For issues impacting customer experience, faster identification of root cause has improved response times and clarity in remediation. “We improved reliability by giving every team, from infrastructure to security to DevOps, the same view of the platform and clear ownership of their part of it,” Taylor says. For customers, this translated directly into more consistent publishing experiences and faster resolution when issues did arise.
The decision to unify observability and security under one roof paid dividends beyond operational efficiency. With operational overhead reduced, engineering time was recovered and redirected toward the product roadmap. “Partnering with Datadog gave us back the time and focus we needed to lead with innovation,” says Croney. “That’s what allowed us to ship the experiences we’d been wanting to build.”
Arc XP has since shipped new AI-powered experiences, including Ask the News and the Agentic Editor and Composer, both monitored by Datadog LLM Observability to support model quality, reliability, and oversight, bringing AI agents and human journalists together to collaborate on richer, more compelling storytelling. That momentum is only accelerating.
Arc XP is at the forefront of bringing AI and agentic experiences to the media industry. Looking ahead, the team is already exploring Datadog Bits AI to take the next step: extending its unified operational model with AI-assisted detection and response, where automation can support initial investigation, accelerate incident workflows, and help teams remediate issues more efficiently. With a more resilient platform foundation and more engineering time available for innovation, Arc XP sees a significant opportunity to help media companies amplify the stories they tell, connect with new audiences through AI, and build sustainable businesses in a rapidly changing industry. Reliability is what makes innovation possible, and for Arc XP, that work is only beginning.