Mondelēz International unifies observability across hybrid environments and accelerates incident resolution with Datadog | Datadog
Mondelēz International unifies observability across hybrid environments and accelerates incident resolution with Datadog

case study

Mondelēz International unifies observability across hybrid environments and accelerates incident resolution with Datadog

About Mondelēz International

Mondelēz International is a global snack and food company known for brands like Oreo, Cadbury, Ritz and Trident. The company was founded in 2012 and operates in over 150 countries.

Consumer Packaged Goods
~90,000 Employees
Chicago
“The major value here is that Datadog is not an AWS-only solution. It combines AWS with all of our other legacy cloud systems and our on-premises environment. It becomes that central platform and clearinghouse for all of our observability work when it comes to infrastructure.”
case-studies/mondelez-international/sean-tibor
“The major value here is that Datadog is not an AWS-only solution. It combines AWS with all of our other legacy cloud systems and our on-premises environment. It becomes that central platform and clearinghouse for all of our observability work when it comes to infrastructure.”
Sean Tibor Director, Global Cloud Engineering Mondelēz

Why Datadog?

  • Unified observability across AWS, on-premises, and multi-cloud environments
  • Simple deployment with immediate visibility and time to value
  • Database Monitoring delivers deep visibility into SQL Server performance for critical workloads
  • Infrastructure Monitoring enables EC2 rightsizing and cost savings across the entire enterprise environment
  • Event Management capabilities and composite monitors reduce alert noise
  • On-Call streamlines incident response, replacing manual escalation processes

Challenge

Mondelēz's existing observability tools struggled to provide comprehensive visibility across its hybrid environment and produced hard-to-prioritize alerts, slowing detection and resolution of critical incidents.

Key results

↓ 76% reduction in incidents

Significantly reduced the number of incidents requiring investigation

↓ MTTR 50%+

From over 21 hours average to nine hours for P3 incidents

$150K annual savings

Through rightsizing EC2 instances for supply chain applications

Fragmented visibility and noisy alerts slow incident response during major cloud migration

Mondelēz International is a Fortune 500 company and maker of well-known snack foods such as Oreos, Sour Patch Kids, Wheat Thins, and Clif Bars. Its products are in billions of households around the world. “We make a little over 40 billion Oreo biscuits a year—enough for everybody in the world to have a sleeve all to themselves,” says Sean Tibor, Director, Global Cloud Engineering.

Technology plays a vital role in keeping Mondelēz’s products flowing around the world. The company increasingly relies on its IT services and digital solutions to both create demand for its products and ensure that product availability keeps pace with customer demand. “The average person probably doesn’t think much about how the packet of Oreo biscuits gets to the shelf at their local store,” says Tibor. “But there are a lot of people and systems working behind the scenes to make that happen.”

Mondelēz has a complex IT environment, much of which has been in use since before the company’s split with Kraft in 2012. Given its increasing reliance on technology, the company recently launched a large modernization project. As part of that effort, it is working to decommission its on-premises data centers, migrate to AWS as its strategic cloud provider, simplify IT operations, and bring resources in-house.

As this effort began, observability challenges emerged. The company’s existing observability tool struggled to provide seamless visibility across its complex on-premises, AWS, and multi-cloud environments. Making matters worse, they were experiencing noisy alerts on systems, which made it hard to distinguish P1 and P2 incidents from low-priority events. This made it difficult for teams to quickly identify and respond to critical issues that could impact operations and resulted in an increase in both mean time to detect (MTTD) and mean time to resolve (MTTR).

Mondelēz needed to improve observability across its entire environment, reduce alert noise, and implement a tool that could help simplify and automate manual processes as it continued to migrate critical workloads to AWS.

Mondelēz Oreo Cookies

Unified observability to support cloud migration and critical business operations

Mondelēz chose Datadog for its ease of migration and ability to deliver value quickly. The company deployed Datadog to support three primary use cases.

First, Mondelēz is using Datadog to monitor its multi-year, billion-dollar initiative to modernize enterprise ERP and planning tools. The company is migrating its ERP landscape to SAP RISE, with over 160 engineering and operations users currently using Datadog. “One of the things we’ve done with our supply and demand planning project is use Datadog monitoring during the very intense qualification and build-out phase to ensure that the infrastructure is very reliable, robust, and well optimized,” says Tibor.

“One of the things we've done with our supply and demand planning project is use Datadog monitoring during the very intense qualification and build-out phase to ensure that the infrastructure is very reliable, robust, and well optimized.”

The platform has proven particularly valuable for monitoring one of Mondelēz’s larger Kubernetes workloads. “Datadog has some really powerful capabilities in terms of being able to analyze and understand patterns of use and orchestration of that platform,” says Tibor. “We’ve been able to propose changes and improvements to the product team to further optimize it.”

Second, Mondelēz uses Datadog Database Monitoring (DBM) to oversee one of the largest SAP instances running on SQL. This business-critical system must be available at all times to ensure supply chains function properly across entire regions. “Before Datadog, we continued to encounter stability issues, and every time one of these systems would become unavailable or have problems, it meant the products stopped flowing. And nobody wants that to happen,” says Tibor.

Previously, teams wrote scripts to capture metrics every five minutes into text files, with no historical data for analysis. “After we had Datadog, one of the first things we did was put the Datadog Agent on the database servers for this, and we started getting all of that information,” says Tibor.

“DBM is giving us that extra layer of visibility. It’s unlocking insights that we lacked for years,” adds Gedi Muraska, Cloud Operations Lead.

Third, Mondelēz became an early adopter of Datadog On-Call, which is transforming its incident management approach. Previously, the company relied on manual processes to determine who was on call. “Datadog On-Call has been a game changer for us. It’s already made life a lot easier, and we’re not even using it fully yet,” says Edina O’Gradney, Operations Lead.

“Datadog On-Call has been a game changer for us. It's already made life a lot easier, and we're not even using it fully yet.”

The company expects 80 to 100 people across its engineering, cloud, and operations teams will eventually use On-Call.

Dramatically improving signal-to-noise ratio

One of the most significant improvements has come from Datadog’s ability to reduce alert noise and improve incident response. The team is using composite monitors and predictive alerts to create more granular, targeted alerting. “Simply doing a lift and shift from the previous vendor to Datadog revealed why some of the incidents were getting unexpectedly stuck,” says Muraska. “With Datadog, we realized how much more specific we can go with our monitoring setup. We have much more complex and robust rules. We are starting to use predictive alerts.”

Mondelēz has also used Datadog Event Management to correlate and deduplicate events, shifting away from creating an incident every time an alert fires. The result was a 76% reduction in incidents, from 1,000 monitoring events down to 250 incidents. “It means we don’t have to dig through hundreds of identical incidents,” says Tibor.

Datadog is now becoming the center of Mondelēz’s incident response processes. Through integration with ServiceNow, the company’s enterprise-wide system of record, teams can work within Datadog while maintaining synchronized data across both platforms. “Datadog is becoming more and more the system of engagement,” says Tibor. “This is where people are doing the work, and they’re engaging with the data, and they’re tracking their work. The greatest value for us is that we don’t have to go to different places and screens and systems to figure out what’s going on. We go to Datadog, and it’s all pulled into one place there.”

“The greatest value for us is that we don't have to go to different places and screens and systems to figure out what's going on. We go to Datadog, and it's all pulled into one place there.”

Saving costs, preventing downtime, and scaling observability across the organization

As Mondelēz continues migrating to AWS as its primary strategic cloud provider, Datadog has provided critical visibility across its entire hybrid environment. “There are a lot of benefits and connections and synergies between AWS and Datadog that are really great,” says Tibor. “For us as a global organization that has a complex infrastructure, the major value here is that it’s not an AWS-only solution. It combines AWS with all of our other legacy cloud systems and our on-premises environment. It becomes that central platform and clearinghouse for all of our observability work when it comes to infrastructure.”

The company has already realized significant cost savings by using Datadog to optimize its infrastructure. By examining utilization patterns and rightsizing instances, Mondelēz is now saving ~$150,000 annually simply through better visibility into resource usage.

Perhaps most critically, Datadog has helped Mondelēz avoid downtime on business-critical systems. The SAP Database Monitoring implementation delivered immediate value. Within seven days of installing the Datadog Agent, the team detected and prevented two incidents that could have disrupted operations. “That saves the company potentially millions of dollars in lost productivity,” says Tibor.

The improvements in incident management have also been significant. MTTR for P3 incidents dropped from 23 hours to 9 hours—a reduction of more than 60%. Meanwhile, a 76% reduction in incident volume means teams spend less time sorting through duplicate alerts and more time solving real problems.

Finally, teams at Mondelēz have used Datadog to optimize telemetry collection from its AWS environments across host-based and cloud-based metrics, reducing latency from 10 to 2 minutes and costs by 70%. “We keep finding these ways in which Datadog supports us in becoming more agile and connected with the AWS platform,” says Tibor.

Looking ahead, Mondelēz plans to expand Datadog’s use across the organization. “We’re looking at Datadog as a platform for growth in capability across the organization regardless of which cloud, data center, or office location it sits in,” says Tibor.

“Nobody thinks of the company that makes Oreos, Sour Patch Kids and Clif Bars as being a technology-driven company,” adds Tibor. “They’re just enjoying the snack that we created. But a surprising amount of technology is powering all of this. There are some really talented people who are working every day to make that happen, and they’re relying heavily on Datadog’s capabilities to do that. When you peek under the covers of what it takes to run our environment and deliver for the business every day, we’re taking the capabilities and flexibility that we have with Datadog and applying it to our complex technology environment in a way that is a breakthrough in the consumer packaged goods space.”

Resources

blog/ecs-default-monitors/ecs-default-monitors-hero

BLOG

Catch and remediate ECS issues faster with default monitors and the ECS Explorer
blog/optimize-sql-server-performance-with-datadog/sql-server-hero

BLOG

Optimize SQL Server performance with Datadog Database Monitoring
blog/aiops-intelligent-correlation/intelligent_correlation_hero

BLOG

Automatically group events and reduce noise with AI-powered Intelligent Correlation
blog/datadog-on-call/datadog-on-call-hero-final

BLOG

Enrich your on-call experience with observability data at your fingertips by using Datadog On-Call