---
title: "Monitor Temporal Cloud with Datadog"
description: "Use Datadog’s Temporal Cloud integration to monitor Workflow health, Worker polling, service latency, and account limits in one place."
author: "Bowen Chen, David Pointeau, Gustavo Gutierrez"
date: 2025-04-24
tags: ["infrastructure monitoring", "temporal"]
blog_type_id: the-monitor
locale: en
---

[Temporal Cloud](https://temporal.io/cloud) is the managed service for the Temporal durable execution platform. Teams use it to delegate infrastructure management and focus on building reliable, resilient applications. As Temporal adoption grows, monitoring Workflow health, Task execution, and service performance becomes critical to keeping your durable applications running smoothly.

Datadog’s [Temporal Cloud OpenMetrics integration](https://docs.datadoghq.com/integrations/temporal-cloud-openmetrics.md) collects metrics from Temporal’s [OpenMetrics endpoint](https://docs.temporal.io/cloud/metrics/openmetrics), which exposes operational data in the industry-standard Prometheus format. We worked with Temporal’s engineering team to co-design the metric schema and validate the integration from end to end. The integration ingests more than 30 metrics without requiring an agent or infrastructure to manage, and Datadog classifies the metrics as standard metrics (so there are no custom metrics charges). Because these metrics live alongside your existing infrastructure, APM, and log data in Datadog, you can correlate Temporal Workflow performance with the services that depend on them. As a result, you get more context than Temporal’s built-in metrics UI or a stand-alone Prometheus or Grafana setup can offer on its own. 

In this post, we’ll explore how to use the integration’s [preconfigured dashboard](https://app.datadoghq.com/dash/integration/32242/temporal-cloud-openmetrics---overview) to:

- [Track usage against Temporal Cloud account limits](#track-usage-against-account-limits)
- [Visualize the health and performance of Temporal Cloud services](#visualize-the-health-and-performance-of-temporal-cloud-services)
- [Monitor Temporal Workers’ Task polling](#monitor-temporal-workers-task-polling)
- [Quickly identify errors in Temporal Cloud Workflows](#quickly-identify-errors-in-temporal-cloud-workflows)

![The Temporal Cloud overview dashboard, which shows a Monitors Summary and a Task Polling Overview.](https://web-assets.dd-static.net/42588/1778503115-temporal-cloud-overview-dashboard.png)

> [!NOTE]
> This integration replaces the previous Temporal Cloud integration tile. If you&rsquo;re using the older tile, we recommend that you migrate to the [OpenMetrics-based integration](https://app.datadoghq.com/integrations/temporal-cloud-openmetrics). The previous tile will be deprecated in a future release.

## Track usage against account limits

The Usage and Limits section of the dashboard overlays your actions per second, operations per second, and requests per second against their respective account limits. By comparing usage to these limits, you can see how much headroom you have before Temporal starts throttling.

When you do reach limits, the corresponding throttled metrics show exactly when and how often throttling occurs. You can set [Datadog monitors](https://docs.datadoghq.com/monitors.md) on these metrics to alert you before you reach capacity. You can then request limit increases from Temporal or optimize Workflow patterns to reduce action consumption.

![Average actions per second as compared to the Temporal Cloud account limit.](https://web-assets.dd-static.net/42588/1778503276-temporal-cloud-actions-per-second.png)

## Visualize the health and performance of Temporal Cloud services

Temporal Cloud manages the backend service that accepts and processes API requests from your Workers. When this service is under heavy load, it can cause delays in Workflow execution. The dashboard gives you real-time visibility into this layer.

You can track current load by monitoring gRPC request rates over time and observe how frequently the service throttles incoming requests or encounters errors. The Service Request Details section shows top operations by request rate, top operations by error rate, and the root causes of rate-limited requests. When gRPC Error Rate spikes occur, you can investigate your Temporal SDK logs to determine whether issues are network-related, result from SDK misconfiguration, or stem from Workflow rates exceeding quotas.

![Average gRPC error rate, average gRPC request rate, and average rate-limited requests rate in the Service Request Details section.](https://web-assets.dd-static.net/42588/1778503368-temporal-cloud-service-request-details.png)

### Monitor service latency

The dashboard tracks Temporal Cloud’s service latency at the p50, p95, and p99 percentiles across key operations like `StartWorkflowExecution` and `SignalWorkflowExecution`. By comparing latency trends against your Workflow throughput, you can identify when Temporal becomes a bottleneck. The top-level Service Latency p95 widget gives you an at-a-glance health check. You can explore the full Service Latency Overview section for per-operation breakdowns.

For multi-region namespaces, the dashboard also tracks replication lag at the p50, p95, and p99 percentiles so that you can monitor cross-region consistency.

## Monitor Temporal Workers’ Task polling

Temporal Cloud uses Task polling to load balance Tasks across available Workers. When Workers actively poll Task Queues, the service assigns Tasks from memory (synchronous matching). When no Workers are available, Tasks move to the persistence layer, and the service must reload them when Workers resume polling (asynchronous matching). Retrieving Tasks from the persistence layer instead of memory increases latency.

The Sync Match Rate on the dashboard indicates matching efficiency. You should target 99% or higher. The dashboard breaks down poll success, sync success, and timeouts by Task type in sunburst charts so that you can see which Tasks are the bottleneck.

The integration also tracks the Task Queue backlog, which reports an approximate count of Tasks waiting to be picked up. A growing Task Queue backlog is a leading indicator of Worker capacity issues. If the backlog is growing while the Sync Match Rate is dropping, you need more Workers or more pollers per Worker. The top-level Current Task Queue Backlog widget gives you this number at a glance.

![Sync Match Rate alongside Task Poll Success Rate over Time in the Task Polling Overview section.](https://web-assets.dd-static.net/42588/1778503503-temporal-cloud-task-polling-overview.png)

## Quickly identify errors in Temporal Cloud Workflows

Temporal Workflows are the foundation of Temporal’s programming model. Monitoring different Workflow end states (cancellations, terminations, failures, and timeouts) enables rapid error identification.

Workflow end states carry distinct implications. Cancellations are user-initiated graceful exits. Terminations forcibly end execution without standard cleanup, potentially leaving in-progress Activities unfinished or external operations only partially completed. High failure rates indicate unhandled exceptions or improper error handling. High timeout rates can point to missing pollers, unprovisioned Workers, or retry logic errors.

The Workflow Overview section shows success, failure, cancellation, termination, timeout, and continued-as-new rates over time, broken down by Workflow type. The dashboard also includes tables of successful and failed Workflows by type so that you can pinpoint which Workflow definitions are problematic.

The integration’s Open Workflow Count metric gives you visibility into how many Workflow executions are running across your namespaces. A growing Open Workflow Count without a corresponding increase in completions can signal stuck or long-running Workflows that need investigation.

![Average rates and rates over time for Workflow success, Workflow cancellation, and continued-as-new Workflows in the Workflow Overview section.](https://web-assets.dd-static.net/42588/1778503555-temporal-cloud-workflow-overview.png)

## Get started monitoring Temporal Cloud with Datadog

Datadog’s Temporal Cloud OpenMetrics integration helps you monitor Workflow execution, track service performance, identify Worker capacity issues, and stay within account limits. All metrics are classified as standard metrics, so you pay nothing beyond your existing Datadog plan. To get started, check out the setup instructions in our [Temporal Cloud integration documentation](https://docs.datadoghq.com/integrations/temporal-cloud-openmetrics.md). 

For organizations that self-host Temporal services, see our blog post on [monitoring Temporal Server with Datadog](https://www.datadoghq.com/blog/temporal-server-integration.md).

If you don’t already have a Datadog account, you can <!-- Sign-up trigger (sign up for a free 14-day trial) omitted --> to start monitoring Temporal Cloud.