Reducing Operational Overhead and Building Value Through Improved Visibility | Datadog
CASE STUDY

Reducing operational overhead and building value through improved visibility

Learn how Datadog's monitoring capabilities helped Arc XP solve for environmental complexity as it scaled to support rapid growth

About Arc XP

An independent division of the Washington Post with 300 employees, Arc XP is a cloud-native digital experience platform that enables large organizations to create and distribute content, drive digital commerce, and deliver powerful multi-channel experiences.


Key Results

Faster Diagnoses

Datadog’s proactive alerting and anomaly detection help prevent problems before they appear, leading to a better customer experience.

A Return to Core Competencies

Datadog lowered Arc XP’s operational overhead and enabled engineers to focus on building features.

15% Drop in MTTR in One Year

Engineering groups that adopted Datadog APM saw a big drop in issue resolution times in 2021 alone.


The Challenge

As Arc XP grew, operational overhead increased, and it became more challenging for the Arc XP team to keep track of their environment and meet SLOs. The team needed to find a monitoring tool that could work in their complex environment and allow the organization to focus on building value for its customers.


Why Datadog?

Datadog’s ease of use, along with its support for Arc XP’s complex, polyglot codebase and key features such as auto instrumentation, proactive alerting, and anomaly detection, allowed engineering and support teams to perform their work with more confidence and efficiency.


Built to solve the complexities of digital publishing

As newspaper publishers shifted focus to address changing consumer behaviors and meet growing digital content demands, many were challenged by the complexities of creating, curating, and distributing content in what’s become a 24-hour news cycle. For the Washington Post, the answer was to develop an in-house content management system (CMS) known as Arc XP. Built to support the creation of tens of thousands of pieces of content daily, Arc XP is now an independent division of the Washington Post, offering an end-to-end digital experience platform that supports the content, digital commerce, and front-end website experiences of media companies, brands, and enterprise organizations across the globe. Since its launch in 2014, Arc XP has experienced explosive growth, and now powers eight billion monthly page views across more than 1,900 sites in over 25 countries.

Growing pains due to lack of observability

Despite this success, the expanding customer base and technical footprint of Arc XP initially brought challenges that were hampering the organization’s core mission. Nitin Alreja, Arc XP’s VP of engineering, explained that in the team’s early years, engineers wanted to improve their capacity to observe the health and performance of feature deployments across the platform. “As we grew and added more features, it became extremely important to keep track of everything 24–7 and make sure we were keeping up with our SLOs,” said Nitin. The company’s cloud infrastructure was expanding to multiple regions around the globe, and adequate visibility was vital to ensure a high quality experience for its customers. Furthermore, the engineering teams wanted a better tool to help them with technical support and IT operations. Engineers had no universal, proactive alerting system that could inform them as soon as issues arose, or a solution that could help them diagnose issues quickly.

These technical obstacles, as significant as they were, ultimately presented the organization with an even more fundamental business challenge. The more time the Arc XP team spent bogged down in operations and support, the less it was using its strengths to pursue its organization’s founding mission. “Our core competency is not building operations. It’s building our experience management platform,” said Nitin.

“ Some of the other tools that we had looked at were focused on a single language or stack. We recognized the value that Datadog provided for the polyglot environment.”

Nitin Alreja
VP of Engineering, Arc XP

Datadog solves complex monitoring challenges

The Arc XP team knew that a powerful and centralized monitoring system could help them overcome these early growing pains. But finding a monitoring tool that could work within their particular environment would not be easy. The platform had evolved as a collection of interconnected microservices that were originally written as separate tools by different business units at the Washington Post. These tools were built with various languages, frameworks, and database types, including Go, Java, Node.js, React, Python, Amazon Aurora, and MongoDB. To this day, different engineering teams are responsible for developing and maintaining the platform’s growing and diverse set of microservices.

It was clear to the Arc XP team that any monitoring tool would need to be compatible with the complex codebase of its digital publishing platform. But the need to gain visibility over this diverse environment and reduce operational overhead drove additional feature requirements as well. For example, the solution would have to be easy to use to prevent unnecessary friction in adoption, and also be highly customizable. “It was critical for teams to be able to create custom dashboards relevant to them, with the right API metrics, infrastructure metrics, or custom metrics,” said Nitin. To help the team stay focused on its mission, the monitoring solution would also have to be 100% cloud-based, just as Arc XP itself was. “Whatever tool we chose would have to be on the cloud from the get-go, just to reduce the operational costs.” Finally, the solution chosen would have to offer proactive alerting to further cut down on the time needed to resolve issues.

To meet these complex monitoring challenges, Datadog offered a clear solution.

Ease of use drives organic Datadog adoption

Seeing that Datadog was based in the cloud and that it supported the platform’s polyglot environment, the Arc XP team took the tool around for a trial spin. “Some of the other tools that we had looked at were focused on a single language or stack,” said Nitin. “We recognized the value that Datadog provided for our polyglot environment.”

During the trial period, Datadog’s ease of use stood out. When Arc XP started using Datadog, the first impression was that it was easy to use and provided a shared single source of truth for everyone, including developers, QA teams, performance teams, and product teams. The enthusiastic response from the support team was especially important, given the organization’s goal of reducing issue resolution times. “For the support folks, it was easy to navigate the tool, to understand the data points, and to understand the metrics it was collecting,” said Nitin. “They saw they could be self-sufficient with Datadog, as opposed to requiring training. So the key driver there was the ease of use.”

Arc XP began its Datadog journey with custom metrics. With time, the organization adopted additional Datadog products, including Infrastructure Monitoring, Application Performance Monitoring (APM), and Synthetic Monitoring. As Nitin explained, “It started as a free trial, which turned into a small project, which turned into a bigger project. It just kept growing organically.”

For APM in particular—which the team now uses heavily—the ease of its auto instrumentation feature was key to adoption. “We are very microservice-focused, which means we have a lot of codebases around the organization. Having us instrument each one of those codebases could get expensive and take us away from actual feature development,” said Nitin. “But it’s trivial to instrument your code with Datadog. That makes it easy for us to focus instead on building features for our customers.”

“ It’s trivial to instrument your code with Datadog. That makes it easy for us to focus on building features for our customers.”

Nitin Alreja
VP of Engineering, Arc XP

Better visibility reduces issues and resolution times

Since adopting Datadog, Arc XP has seen huge improvements across all business units in system visibility and in the ability to detect and fix issues.

As one sign of this improved visibility, Datadog dashboards have been embraced throughout the organization. Engineering teams, for example, all use dashboards customized for their business units. “The ability to have dashboards to show the health of the organization is critical,” said Nitin. “It’s one slide that tells you the story of the day.”

Datadog has also helped Arc XP engineers and support staff reduce the turnaround time for finding and resolving production issues. In the year 2021 alone, the mean time to resolution (MTTR) for teams that use Datadog APM improved by 15%. Even more significantly, the number of issues reported decreased by 33% thanks to Datadog features such as proactive alerting and anomaly detection. “For us to know early on if we’ve broken something in production—and then be able to revert those changes immediately—that’s where features like proactive alerting and anomaly detection provide a big value for us,” said Nitin.

A return to the core mission

Most significantly, Datadog solved a crucial business problem for Arc XP by allowing Arc XP engineers to focus on building out their company’s digital experience platform. And as Arc XP continues its steep growth trajectory today, quickly adding high-quality features that deliver value to its customers, Datadog remains an essential asset. “Thanks to Datadog, we have significantly improved our software development lifecycle by making sure the right metrics are being monitored, observed, and alerted on,” said Nitin. “Datadog is an indispensable tool in our tool belt. It’s been almost like a big brother, watching out for us and keeping an eye on our operations.”

Resources