Finding opportunities to strengthen reliability and customer service
Okta is a leading provider of identity and access management (IAM) services that helps enterprises across financial services, healthcare, retail, and government authenticate and secure workforce and customer logins. Auth0 is Okta’s IAM platform dedicated to helping developers secure customer-facing applications, manage logins, and administrate access controls. Many also use Auth0 to support compliance efforts related to identity, privacy, and security as stipulated by GDPR, HIPAA, SOC 2, ISO 27001, PCI DSS, and FedRAMP.
As a vital component to their customers’ security infrastructure, Auth0 maintains 99.99% availability of its core services, and is always looking for ways to shorten mean time to detect and resolve issues before they impact end user logins. “If we are down, our customers and their users feel the impact immediately,” notes Matt Drozdz, Senior Engineering Manager for Observability and AI Productivity.
Managing technology effectively at Auth0 is critical given the scale, complexity, and security demands of its environment. In addition to being a prime target for threat actors, Auth0’s infrastructure must continuously adapt to support frequent feature updates, evolving industry standards, and growing customer demand. To sustain this pace, engineers need complete visibility across their systems to act quickly and confidently.
Supporting 99.99% uptime with unified observability
Auth0 proactively identifies ways to make its environment more efficient, resilient, and secure. Delivering on its 99.99% uptime SLA to its customers—equating to only 52 minutes of downtime per year, or 4 minutes per month—requires engineers to detect and resolve issues in seconds, not minutes. “To meet this SLA, it’s critical for engineers to identify issues and resolve them as fast as possible,” says Andy Puch, Senior Software Engineer. “Every second counts.”
To achieve this, Auth0 launched a strategic, organization-wide initiative toward unified observability with Datadog. The first phase of the plan was to correlate trace data and infrastructure metrics. By integrating metrics and traces into a single platform, developers and monitoring users found their query times were faster, which reduced monitoring toil and enabled them to reclaim valuable time in their day. The team was able to refocus that time on innovation and other business critical roles.
Expanding on this initial success, the next phase was to bring in all their logs without increasing costs. The team ingests over five to ten billion logs per month and previously, to keep systems running, the team either sampled the data or spread across 30+ clusters. By using Flex Logs, they achieved 100% log volume retention without increasing costs.
“For the first time with Flex Logs, we have comprehensive and affordable log retention and can investigate incidents in minutes, not hours,” explains Puch.
According to Andrew Yu, Vice President of Engineering, migrating Auth0’s logs was a strategic decision to improve developer productivity and customer experience. “By bringing our logging together with our metrics and tracing, our RCA costs have decreased and we can deploy faster than before,” says Yu.
The initiative also transformed how Auth0’s engineers work. In just six weeks, the observability team led a coordinated adoption effort across global engineering groups—training users, hosting workshops, and tracking dashboard usage. Adoption was immediate and widespread, establishing unified observability as a shared practice across teams and maximizing Auth0’s technical investment value.
By consolidating all telemetry in one interface, Auth0 engineers dramatically improved operation efficiency:
- 94% faster log queries for complex queries
- 2.5x faster incident detection and resolution
- 45% reduction in RCA and resolution costs
How democratization of observability data led to IT transformation
With all logs, metrics, and traces centralized, engineers no longer had to wonder if they were missing data. Every event could be retained, searched, and analyzed instantly. “We now have the confidence to keep everything we need, not just a subset,” says Drozdz. “It’s transformed how our teams work, collaborate, and secure customer trust.”
That visibility had ripple effects:
- Engineers across time zones can self-serve observability data without bottlenecks, supporting a 99.99% uptime record.
- Access to telemetry data was democratized, reducing silos and unlocking faster product iterations.
- Compliance reporting became easier and faster as cost-effective log retention easily aligned with regulatory requirements.
Building resilience and trust at a global scale
With unified observability, Auth0 engineers are more agile. They are faster at detecting, mitigating, and preventing issues, and better equipped to build secure, high-performing identity products. As digital identity becomes more critical to every online interaction, that agility is essential.
“With AI agents acting autonomously, identity and observability are critical to secure trustworthy decision-making at scale,” says Okta CTO Bhawna Singh.
Okta and Auth0 are at the forefront of this shift, delivering secure identity management for a world where digital trust is essential. Auth0’s commitment to innovation and resilience continues to drive its success. With unified observability at its foundation, the company is shaping the future of secure, intelligent, and reliable digital identity.