Fragmented systems undermine digital-first customer experiences
Thrivent is the original purpose-based company. It is member-owned, founded more than a century ago when neighbors came together to insure each other. Today, Thrivent serves over 2 million clients, manages more than $200 billion in assets and holds superior financial health ratings from Moody’s and S&P.
In recent years, Thrivent has been making a concerted shift from being insurance agents to purpose-driven financial advisors, helping individuals and families plan for the future, establish legacies for their loved ones, and strengthen their communities. The goal was twofold: enhance client satisfaction and loyalty by creating financial plans that reflect their customers’ values and goals while also fostering a more rewarding and sustainable career path for advisors. To bring this vision to life, Thrivent increasingly relied on its website and digital channels to deliver seamless client-centric experiences. At the heart of this transformation was a critical question: How do we build the kind of digital infrastructure that supports our mission—and how do we know when something breaks?
But that shift came with a complex set of challenges—legacy systems, siloed teams, inconsistent observability practices, and outdated tooling. Thousands of virtual machines, on-prem Kubernetes clusters, mainframes, and 70-year-old policy systems existed alongside new cloud-native applications. Teams were distributed across more than a hundred product units, many of them operating independently with their own tools and methods. Observability was inconsistent and reactive. When something failed—like a customer login error—the first sign was often a call to the help desk. From there, issue resolution could take hours or even days, as engineers manually sifted through disconnected logs, dashboards, and systems.
“Previously, we were relying on call center tickets as our first indicator that something was wrong,” says Eric Hartmann, Engineering Manager. “That meant we were already behind by the time we started investigating.”
Building a culture of observability to unite them all
To modernize its observability approach, Thrivent turned to Datadog—not just as a tool, but as a strategic partner to overcome challenges blocking enterprise-wide digital transformation. The platform’s ability to bridge legacy systems and cloud-native applications made it a natural fit for Thrivent’s hybrid environment, which included decades-old mainframes, thousands of virtual machines, and AWS-based Kubernetes workloads.
The rollout began with Datadog Log Management and quickly expanded to include Application Performance Monitoring (APM), Real User Monitoring (RUM), Product Analytics, Session Replay, and Software Catalog.
Flex Logs was a foundational enabler. Thrivent’s previous logging infrastructure was costly and cumbersome. With Flex Logs, the team gained scalable log ingestion at nearly half the cost—making Datadog viable for enterprise-wide use. Logs were searchable, retained, and less expensive to manage, allowing teams to keep critical telemetry data without compromise.
“With Flex Logs, the team gained scalable log ingestion at nearly half the cost—making Datadog viable for enterprise-wide use.”
APM and distributed tracing were instrumental in untangling complex service dependencies, especially across hybrid workloads. Flame graphs, service maps, and trace correlation gave engineers the clarity they needed to pinpoint issues quickly, trace the root cause of latencies or failures, and resolve them before they escalated—a huge leap from their previous log-dump-based diagnostics.
On the front end, RUM and Session Replay provided real-time insights into customer behavior. Teams could now identify where users were dropping off, experiencing errors, or encountering performance issues—and connect that experience to backend metrics.
Crucially, these observability capabilities didn’t just offer technical benefits. They became the linchpin for transforming how teams collaborated, prioritized, and responded, supporting a cultural shift toward shared ownership, faster decision-making, and proactive problem-solving.
For example, with a simplified interface and intuitive charts, Datadog’s platform dashboards helped close the gap between technical and non-technical teams. Product owners could self-serve insights through Product Analytics and see application performance easily in a single pane of glass. This allowed them to analyze usage trends, conversion rates, and user friction without relying on analyst support.
Another example is how the Service Catalog became a central hub for team ownership and accountability. Instead of relying on outdated documentation or tribal knowledge, teams could instantly find who owned a service, tag alerts accordingly, and escalate issues via integrated tools like Slack. This significantly improved triage time and eliminated unnecessary delays in cross-team coordination.
Impact at scale
Thrivent’s engineers, product managers, and business stakeholders were able to get a unified view of application performance, infrastructure health, and customer experience. Results include:
- 66% improvement of Mean Time to Recovery (MTTR), dropping from nearly 10 hours to just over 3 hours when issues were detected through Datadog monitors instead of traditional, call center-based detection.
- 76 support tickets per month, which are now being handled directly by Datadog, reducing the burden on Thrivent’s internal engineering staff.
- 50% cost savings due to the consolidation of logging platforms compared to the previous logging setup.
- 82 Service Level Objectives from 0, introducing a culture shift in observability that will enable teams to make more stable and well-informed decisions.
Datadog’s ease of use, combined with strong partnership support from Datadog staff, allowed Thrivent’s core technology team to scale training and adoption quickly. “We needed a tool that would help us change culture. Datadog made that shift easy and attractive—it gave people something they actually wanted to use,” says Jordan Winters, Engineering Manager at Thrivent.
Looking forward
Thrivent’s journey is far from over. With over 1,000 engineers and more than 100 product teams, scaling observability remains a key priority. But now, with Datadog in place, Thrivent has the tools, and the momentum to build more resilient systems and deliver better digital experiences.
“Five minutes of downtime still kills me,” says Hartmann. “But now we can see it, act on it, and learn from it—and that’s the shift we needed.”