Google Cloud Run is a managed platform for the deployment, management, and scaling of workloads using serverless containers. Datadog’s integrations for Google Cloud Run and Google Cloud Platform enable you to collect and visualize metrics from your containerized workloads as well as audit logs from your Google Cloud Run resources.
We’re excited to announce that you can now deploy the Datadog Agent alongside your serverless containers in Google Cloud Run to collect traces, logs, and custom metrics directly from your managed Cloud Run services. By instrumenting your services with Datadog, you can easily collect telemetry from your Cloud Run workloads, no matter where they are running. Datadog also generates new enhanced metrics for additional insights into your containers’ cold starts and shutdowns.
In this post, we’ll explore how to instrument your Cloud Run services to send traces and logs directly to Datadog. We’ll also look at how you can collect and visualize custom metrics, and begin generating new enhanced metrics from your serverless containers.
Collect traces and logs from your container-based application
Collecting request traces and logs from your serverless containers helps you better understand their performance, quickly locate issues, and identify a root cause. After you’ve instrumented your Cloud Run application with Datadog, you can begin viewing your traces in Datadog APM. By tagging your Cloud Run traces by service, environment, and version you can quickly filter to the workloads you need and view key performance data including request throughput, latency, and error rates as you deploy new workloads or update existing ones.
If, for example, you identify a service with an elevated error rate, you can then drill into a relevant trace and inspect its flame graph to begin troubleshooting. You can immediately look at relevant service metrics for the span that experienced the error and observe a latency spike.
If you’ve instrumented your Cloud Run application with Datadog, you can also collect logs from your workloads to get more context for easier troubleshooting. From an errorful trace, you can easily pivot to its associated logs to view error messages and gather additional service context surrounding the time of the span. In the example below, we’ve navigated from our errorful trace above to troubleshoot using its associated logs. The error message in our log reveals that our request encountered an endpoint responsible for processing customer orders that was not correctly implemented for this version of our application.
Visualize enhanced and custom Cloud Run metrics
Cold starts and shutdowns are two key events that can affect your overall workload performance and resource consumption. Datadog generates enhanced Cloud Run metrics for cold starts and shutdowns, enabling you to track their prevalence within your workloads. Cold starts occur when there are no provisioned containers available to run your function—Cloud Run must then create a new container to execute the request, leading to startup delay. Depending on the memory allocated and runtime (among other variables), your function’s cold start can add an impactful amount of latency to your invocations and ultimately degrade customer experience. Inversely, shutdowns result from Cloud Run terminating containers that are idle, that encounter an application error, or that exceed their memory limits.
Datadog automatically tags your Cloud Run traces of function calls that experience a cold start or shutdown. This enables you to track whether their volume is appropriate for your application’s request throughput, since high-throughput applications should have a steady supply of available containers. If you notice that your application is experiencing a high volume of cold starts, you may need to scale up the number of containers running. Alternatively, a high volume of shutdowns may indicate an application error causing forced exits, or that your provisioned workloads are exceeding their container’s memory limit.
Enhanced metrics can be used as any other standard metric—you can configure anomaly monitors (shown below) to alert you to abnormal volumes of cold starts and shutdowns, which can indicate issues with workload provisioning. Datadog’s enhanced Cloud Run metrics can be used to set SLO targets, enabling you to track and fine tune your application’s instance count and concurrency settings over time by balancing cost with performance. For example, Cloud Run doesn’t charge for default containers while they are idle, so it may prove valuable to provision a few extra if you notice a high rate of cold starts, especially if your application experiences frequent spikes in traffic.
You can also instrument your Cloud Run functions to send custom metrics to Datadog. With custom metrics, you can monitor KPIs specific to your business, such as the value of each order processed by your web store or the volume of seasonal coupons applied at checkout. Once Datadog begins receiving your metric, you can use it as you would any standard metric to set alerts and SLO targets, or visualize them as dashboard widgets. In the example below, we’ve instrumented our web store application to track the cart value for each checkout. By graphing the cart values from our top merchants, we can visualize the impact of our sales platform and customer purchase trends.
Get deep insights into your Google Cloud Run workloads
With enhanced invocation metrics and the ability to collect custom metrics, traces, and logs, Datadog provides even deeper insights into your managed Cloud Run workloads. To get started, instrument your Cloud Run application with Datadog.
If you aren’t already a Datadog customer, sign for up for a free 14-day trial today.