The Monitor

Introducing Updog.ai: Real-time provider status from Datadog

4 minute read

Published

Share

Introducing Updog.ai: Real-time provider status from Datadog
Brianne Bujnowski

Brianne Bujnowski

Hugo Puceat

Hugo Puceat

When external SaaS providers or cloud services degrade or go down, engineers often find themselves wondering if the issue they're encountering is local or more widespread. The answers they find are usually slow to surface, limited in detail, or entirely dependent on the provider's updates. Vendor-controlled status pages and third-party aggregators don’t provide the timely, independent visibility that's necessary to quickly and accurately identify the root cause of slowdowns.

Introducing Updog.ai, a free public-facing web page from Datadog that shows the live health status of 30+ popular SaaS providers (such as OpenAI, Zoom, and GitHub) and 13 AWS services. Instead of depending on provider updates, Updog.ai is powered by aggregated, anonymized observability data and AI models. Now anyone—not just Datadog customers—can access independent, real-time visibility into the status of the services they depend on, all in one place.

What’s Updog.ai?

Updog.ai is a public web page that provides a single dashboard for monitoring the near real-time health of major SaaS APIs and AWS services. Coverage includes widely used platforms like OpenAI, GitHub, Slack, Stripe, ServiceNow, Zendesk, and Zoom, as well as AWS services such as Amazon S3, AWS Lambda, Amazon DynamoDB, and Amazon RDS.

Updog.ai turns anonymized telemetry data from thousands of environments into real-time status updates, highlighting performance issues or outages the moment they emerge. Engineers can immediately verify if a problem is part of a broader incident or confined to their systems without waiting on vendor-maintained status pages.

Updog.ai showing live status of major SaaS providers and AWS services.
Updog.ai showing live status of major SaaS providers and AWS services.

Updog.ai also offers historical views, providing up to 90 days of degradation history, for easy identification of recurring reliability issues, such as API disruptions that consistently affect customer checkouts. Teams can use these insights to make informed architectural decisions and improve fault tolerance.

Extending observability beyond customer environments

Observability has traditionally been bound by the walls of individual systems, with teams focused on what they could measure within their own environments. Datadog is redefining that boundary by collecting and correlating telemetry data across the entire breadth of our products and customer base. With one of the world’s largest and most diverse streams of telemetry data, we can apply AI models that identify patterns and risks that no single organization can see on its own. This represents a shift from simply helping customers manage their environments to creating shared intelligence.

Updog.ai is an expression of this shift. By analyzing Application Performance Monitoring (APM) data across thousands of organizations, it surfaces systemic error signals that individual teams cannot detect in isolation. In doing so, Updog.ai not only serves engineers in their own environments but also supports the broader community in navigating provider reliability.

Example of health coverage for Twilio.
Example of health coverage for Twilio.

Real-time updates powered by telemetry data and AI

Updog.ai builds on the foundation of Datadog’s External Provider Status in-app feature by using:

  • Aggregated, anonymized APM telemetry data from thousands of organizations
  • A Bayesian model that infers abnormal error rates across independent customer environments
  • Correlation across customers and regions to confirm whether degradations are systemic

This approach enables Datadog to detect issues faster than vendor-controlled pages. For example, Updog.ai recently surfaced an Amazon DynamoDB degradation 32 minutes before AWS updated its own status page. The result is a reliable, AI-driven signal that reflects the real-world experience of users around the globe.

Example of DynamoDB degradation detected by Updog.ai before AWS updates.
Example of DynamoDB degradation detected by Updog.ai before AWS updates.

What’s next: GPU availability monitoring and beyond

This iteration of Updog.ai is just the first step. Over time, its scope will expand beyond availability to include real-time updates for systemic risks, including:

  • GPU availability monitoring, which will enable AI infrastructure teams to plan their workloads
  • Spot interruption monitoring, which will enable infra teams to anticipate spot interruptions and run workloads with extra resilience
  • Cyber attack and vector monitoring, which will provide a view of global malicious actors and the most frequently used attack vectors

Built on anonymized observability data and AI at internet scale, Updog.ai is a comprehensive public resource for real-time service transparency.

Get started with Updog.ai today

Visit Updog.ai today to check the live status of major providers for free. No Datadog account is required. To gain visibility into how these outages impact your own services, explore these features within Datadog by .

Related Articles

How Datadog can support your DORA compliance strategy and operational resilience

How Datadog can support your DORA compliance strategy and operational resilience

Detect and map third-party outages with Datadog External Provider Status

Detect and map third-party outages with Datadog External Provider Status

Datadog achieves IRAP’s PROTECTED status in Australia

Datadog achieves IRAP’s PROTECTED status in Australia

Tax Day, downtime, and tech debt: Lessons for public sector IT resilience

Tax Day, downtime, and tech debt: Lessons for public sector IT resilience

Start monitoring your metrics in minutes