Itaú Unibanco modernizes its observability platform with Datadog | Datadog
Itaú Unibanco modernizes its observability platform with Datadog

case study

Itaú Unibanco modernizes its observability platform with Datadog

About Itaú Unibanco

With 101 years of history, Itaú Unibanco is the largest bank in Latin America. With more than 70 million customers, including individuals and businesses of all sizes, the bank offers a broad portfolio of financial products and services reaching customers in 18 countries.

Financial Services
90,000+ Employees
São Paulo, Brazil
“The modernization of our observability platform with Datadog was essential to make our observability practice more efficient, simplify engineering workflows, and ensure sustainable operations across critical systems.”
case-studies/itau-unibanco/headshot-thiago-morais
“The modernization of our observability platform with Datadog was essential to make our observability practice more efficient, simplify engineering workflows, and ensure sustainable operations across critical systems.”
Thiago Morais Associate Director Itaú Unibanco

Why Datadog?

  • Centralizes logs, metrics, traces, and user experience in a single integrated platform.
  • Provides full visibility across cloud and on-premises environments.
  • Delivers business SLIs and SLOs connected to customers’ financial journeys.
  • Offers petabyte-scale telemetry with governance, security, and cost control.
  • Enables fast detection and response with AI-driven insights.

Challenge

Itaú is undergoing a broad modernization of its technology platform, including a full migration to the cloud by 2028. This initiative addresses growing demand for high availability, performance, and resilience in digital services, while meeting evolving customer expectations. At the same time, the bank evolved its observability strategy by integrating previously siloed processes. This transformation increased reliability, predictability, and operational efficiency at scale, while enabling digital products and services to be designed and operated with a stronger customer focus.

Key results

100%

Observability coverage in the cloud environment

35%

Reduction in time to resolve issues

40%

Reduction in incident rate

From observability to experience: building customer-centric digital services

Itaú Unibanco operates in 18 countries and serves more than 70 million customers with a broad portfolio of products and services. With over a century of history and decades of accumulated technological legacy, the bank operates in a highly complex environment involving thousands of applications and more than 90,000 employees—17,000 of whom are dedicated to technology.

To sustain its evolution and scale efficiently, Itaú is undergoing a major modernization of its technology platform, aiming to fully migrate its infrastructure to the cloud by 2028.

This journey is transforming how the bank develops, operates, and observes its systems, creating more resilient, integrated foundations and preparing them for continuous evolution. In a rapidly digitizing financial services landscape, this modernization is essential. Today, 97% of customer interactions occur through digital channels, requiring high availability, real-time responsiveness, and continuous reliability.

In this context, modernizing the observability platform has become strategic to address challenges of scale, speed, and availability—enabling efficient operations and supporting the delivery of customer-centric digital services.

Itaú Unibanco team

Opportunity: managing scale, speed, and multiple signals

As Itaú modernized its platform, complexity increased across all layers, driving the operation of thousands of services across multiple cloud providers, hybrid environments, and on-premises systems. This resulted in a significant increase in telemetry data volume.

Previously, logs and tracing were spread across different tools, requiring additional effort to correlate signals during incident analysis. The growing volume of logs and alerts created challenges in filtering relevant information, especially in critical systems.

“Maintaining high availability at our scale requires full visibility. It’s essential to have a platform that continuously helps us understand system behavior, customer impact, and risks in real time,” says Thiago Morais.

To ensure operational reliability at scale, Itaú consolidated its monitoring tools into a single platform, replacing fragmented systems with a unified approach capable of handling speed, volume, and system criticality. Alerts are now efficiently routed to responsible teams, ensuring secure and fast operation of essential systems—improving customer experience and supporting operational excellence.

Why Datadog: centralized visibility, AI-driven analysis, and platform modernization support

Adopting Datadog as an integrated observability platform gave the bank a unified view of its infrastructure, with applications, logs, and alerts tied to user experience and integrated with Amazon Web Services (AWS), the bank’s primary cloud provider. Teams gained immediate visibility into services like Amazon EC2, AWS Lambda, and managed databases, accelerating setup and eliminating monitoring gaps in production environments.

In partnership with Datadog, Itaú also created a centralized team to define standards for data ingestion, identification, and tagging—improving consistency, alert quality, and cost predictability.

Another key differentiator was the use of AI enabled by features like Datadog Watchdog and Bits AI. These capabilities help engineers move quickly from detection to understanding incidents by automatically highlighting the most relevant signals.

“Standardization allows teams to move fast without losing alignment. Combined with Datadog’s AI capabilities, it helps reduce guesswork and shorten investigation time,” says Morais.

By unifying observability with Datadog, Itaú achieved measurable results:

Enhancing observability with operational efficiency

Operating real-time banking services requires fast investigation and clear alerting, especially in complex, distributed environments. Itaú addressed this by adopting an integrated operational view—connecting infrastructure, applications, logs, and user experience signals.

With Datadog, teams reduced noise and accelerated root cause analysis, clearly linking customer-reported issues to backend performance in critical services. This enables faster and more effective responses to customer-impacting incidents.

At Itaú’s scale—handling real-time payments and high-availability digital channels—comprehensive log collection is essential for compliance, threat detection, and incident response. However, modernization drove log volumes up to 8 petabytes per month, making cost control critical.

Using Datadog Log Management, teams correlate logs, metrics, and traces, improving context sharing and speeding up incident investigation. Flex Logs helps control ingestion costs while maintaining necessary retention for high-demand use cases.

Additionally, Itaú implemented Observability Pipelines to optimize log flow. These pipelines apply intelligent sampling, preserve critical logs, remove duplicate WARN and ERROR events, and filter low-value logs like routine health checks. This improves alert quality, protects sensitive data, and enables efficient large-scale log management.

“At our scale, logging with Datadog is a strategic decision—not just a technical one,” says Morais.

Key outcomes include:

Connecting frontend experience to backend performance

Delivering high-availability digital experiences at scale requires understanding how customer experience connects to backend performance. Itaú now prioritizes incidents based on customer impact, ensuring fast and effective responses.

To achieve this, Itaú uses Datadog APM and Real User Monitoring (RUM) to link frontend behavior with backend execution. APM provides end-to-end tracing across services, helping identify latency, errors, and dependencies. RUM tracks user interactions in critical journeys such as login and payments.

With RUM Without Limits, teams capture all sessions and control indexing without code changes—focusing on specific users, errors, or campaigns while managing costs.

“With frontend metrics, we can quickly identify issues and support backend teams to act fast, avoiding customer impact,” says Morais.

Measured benefits include:

Building the future with Itaú Unibanco

Itaú’s modernization efforts are transforming how the bank develops and manages its technology, enabling faster delivery, greater resilience, and continuous service availability.

By adopting Datadog as a unified observability platform, Itaú turns large volumes of telemetry data into clear, actionable insights across thousands of services—driving more efficient and proactive operations.

As it progresses toward a fully cloud-native architecture, Datadog serves as the foundation for observability at scale. This continuous visibility allows Itaú to operate with reliability and predictability, creating room for innovation and delivering seamless, customer-centric digital experiences—even as digital services continue to grow.

“Datadog gives us the observability we need to scale securely and maintain customer trust,” concludes Morais.

Resources

solutions/201909-new/solutionsbriefs_finserv_190904_web_final_revised_2880x1000

solutions

Financial Services
Accelerate investigations with AI-powered log parsing

BLOG

Accelerate investigations with AI-powered log parsing
Manage metric volume and tags in your environment with Observability Pipelines

BLOG

Manage metric volume and tags in your environment with Observability Pipelines