Stream Your Google Cloud Logs to Datadog With Dataflow | Datadog

Stream your Google Cloud logs to Datadog with Dataflow

Author Addie Beach
Author Sri Raman

Published: October 25, 2023

IT environments can produce billions of log events each day from a variety of hosts and applications. Collecting this data can be costly, often resulting in increased network overhead from processing inefficiencies and inconsistent ingestion during major system events. Google Cloud Dataflow is a serverless, fully managed framework that enables you to automate and autoscale data processing. By using Dataflow to execute your log pipelines directly from within Google Cloud, you can easily consolidate data from and route data to a variety of sources and sinks. And with Dataflow templates, you can leverage pre-built pipelines for even easier setup.

The Pub/Sub-to-Datadog Dataflow template enables you to efficiently route logs from across your Google Cloud ecosystem to Datadog. Using this template, you can quickly configure Dataflow jobs to pull processed logs and send them to Datadog Log Management. Additionally, the template provides support for collecting your logs when running a virtual private cloud (VPC). Once your logs have been ingested, you can then leverage Datadog for visualizing these logs, using them to build alerts and dashboards, and correlating them with metrics from across your stack.

In this post, we’ll cover how you can use the Pub/Sub-to-Datadog Dataflow template to:

The Pub/Sub-to-Datadog template in Dataflow, including the streaming pipeline diagram displayed alongside.

Route, enrich, and transform your logs with Dataflow

Dataflow pipelines enable you to collect data—including logs—from any source, transform and enrich it, and then send it to external data sinks. Dataflow runs on Google compute engines, so your processing can easily scale according to the size of your workload. Additionally, Dataflow supports batch processing in addition to data streaming—without batching, each log event is sent to your sink as a separate network request, leading to increased network overhead.

With the Pub/Sub-to-Datadog Dataflow template, you can run Dataflow jobs to automatically batch and compress up to 1,000 messages before sending them to Datadog for processing. To create a Dataflow job from the template, you first need to set up an input Pub/Sub topic-subscription pair and a corresponding log export. You should also establish a topic-subscription pair to serve as a dead-letter queue in the event of a failure. From here, you can navigate to the Dataflow section of the Google Cloud Console, select the “Pub/Sub to Datadog” option, and configure the necessary parameters to optimize the pipeline. Then, you can simply click “Run” to run the Dataflow job.

Leverage Dataflow to easily ingest logs into Datadog

With the Pub/Sub-to-Datadog Dataflow template, you have more options for ingesting logs into Datadog, depending on what works best for your system. For example, Google Cloud enables you to host VPCs but provides limited use cases for leveraging push subscriptions with them—these push subscriptions generally can’t access endpoints outside of the VPC perimeter. Because the Pub/Sub-to-Datadog Dataflow template uses a pull-based subscription model, it enables customers using virtual private networks to access endpoints outside their VPC perimeter. This makes the Pub/Sub-to-Datadog Dataflow template the recommended method for sending Google Cloud platform logs to Datadog for analysis.

The library of Google Cloud pipelines in Datadog, with the Logging pipeline expanded to show every step.

Once you’ve ingested your logs into Datadog using the template, you can view them alongside the rest of your logs in the platform. Datadog Log Pipelines automatically parses any incoming logs from third-party integrations—including the Datadog Google Cloud integration—ensuring seamless compatibility for all your observability data. Viewing these logs within the Log Explorer enables you to easily pinpoint trends, detect anomalies, and access usage metrics. You can also use log data to create effective troubleshooting tools, such as dashboards and alerts, that enable you to quickly respond in the event of an issue.

The Datadog Log Explorer filtered to GCP logs.

Start ingesting logs via Dataflow today

By using the Pub/Sub-to-Datadog Dataflow template, you can seamlessly collect and analyze logs with fewer network calls, giving you scalable ingestion while potentially lowering excess network egress costs. And when running a VPC, the Pub/Sub-to-Datadog template provides pull-based subscriptions that make it the recommended solution for sending logs to Datadog.

You can use our documentation to set up log forwarding from Google Cloud services to Datadog via Dataflow. If you’ve not yet a Datadog user, you can sign up for a 14-day today.