The Monitor

Rehydrate archived logs in any SIEM or logging vendor with Observability Pipelines

6 minute read

Published

Share

Rehydrate archived logs in any SIEM or logging vendor with Observability Pipelines
Zara Boddula

Zara Boddula

Pranay Kamat

Pranay Kamat

Security and observability teams generate terabytes of log data every day—from firewalls, identity systems, and cloud infrastructure, in addition to application and access logs. To control SIEM costs and meet long-term retention requirements, many organizations archive a significant portion of this data in cost-optimized object storage such as Amazon S3, Google Cloud Storage, and Azure Blob Storage.

These archives help reduce ingestion spend and meet compliance mandates, but once data is archived, it can be both difficult and costly to retrieve. Retrieving archived logs from cold storage can take hours or even days, during which investigations, audits, and the testing of new routing or enrichment logic can stall.

Rehydration for Observability Pipelines, now available in Preview, bridges this gap. It enables teams to quickly retrieve archived logs from external storage and replay them through existing Observability Pipelines using the same parsing, enrichment, and routing logic applied to live data.

In this post, we’ll explain how Observability Pipelines and Rehydration help you:

Store logs cost-effectively in object storage such as S3, GCS, or Azure Blob Storage

As log volumes continue to grow, teams need to balance visibility with cost. Security engineers often face ingestion limits in their SIEM, while DevOps teams need to preserve full-fidelity data for investigations and compliance.

With Observability Pipelines and ready-to-use Packs, you can easily control which data stays in your analytics tools and which is archived in low-cost object storage. Packs are readymade, source-specific configurations for Observability Pipelines that apply Datadog-recommended best practices, filtering low-value events, parsing important fields, normalizing formats, and routing logs consistently, so teams don’t have to recreate this logic from scratch.

For example, a network security team managing the Palo Alto firewall can start from the Pack that automatically parses firewall logs, deduplicates and filters low-value logs, and extracts key attributes like src_ip, dest_ip, action, and rule. From there, the team can add two simple routes:

  • Filtered logs (e.g., critical or denied actions) flow into their SIEM tool for immediate analysis.
  • Full-fidelity logs are archived in Amazon S3 for long-term retention and compliance.
Quickly configure processing logic with ready-to-use Packs.
Quickly configure processing logic with ready-to-use Packs.

This dual-destination setup helps teams stay under ingestion quotas while maintaining a full record of events. When new threats emerge or an audit requires historical context, the archived logs in S3, Google Cloud Storage, or Azure Blob Storage are ready to be rehydrated through Observability Pipelines without any need for reconfiguring sources or the risk of breaking compliance rules.

Retrieve archived logs on demand to support investigations, audits, and testing

Teams archive data for many reasons: to meet retention policies, control storage costs, or preserve long-term historical context. But when an incident, audit, or test arises, those archives often contain the critical details that active systems no longer retain.

Rehydration for Observability Pipelines gives you an automated way to access that historical data when you need it. Rehydration pulls logs directly from object storage and lets you target the exact time range of the events you want to analyze. Once retrieved, the logs flow through the same parsing, enrichment, normalization, and routing logic you use for live data, including any Packs. This maintains consistent formatting and context whether you’re debugging an outage, investigating suspicious activity, validating new configurations, or preparing for an audit. You can then route the processed historical data to your preferred destinations, such as Splunk, CrowdStrike, Sumo Logic, data lakes, and more.

Let’s look at how different teams use Rehydration for Observability Pipelines in practice.

Identity and access analysis with Okta logs

A security team monitoring authentication activity uses Observability Pipelines to collect logs from Okta. Using the Okta Pack, the team is able to reduce their indexed log volume by retaining only high-value logs such as failed login attempts, policy violations, and a select few sampled successful logins in their SIEM tool. All other successful login events are routed to long-term storage in Amazon S3.

When an alert indicates suspicious logins from an overseas location, the analyst opens the Rehydration tab in Observability Pipelines, selects the S3 bucket that Okta logs were archived to, and rehydrates three days’ worth of successful login events from affected users into their SIEM tool.

Rehydrate Okta logs with Observability Pipelines.
Rehydrate Okta logs with Observability Pipelines.

Network troubleshooting with Palo Alto Firewall logs

A network operations team uses the Palo Alto Firewall Pack to parse and enrich firewall logs. To reduce noise, they filter and deduplicate high-frequency TRAFFIC events before sending full-fidelity copies to an Azure bucket for storage. They keep THREAT and SYSTEM logs in their observability tool for real-time visibility.

When users in one data center report latency, the team opens the Rehydration tab in Observability Pipelines, selects the bucket that the TRAFFIC logs are stored in, and retrieves six hours’ worth of archived TRAFFIC logs for the affected subnet. These logs flow through the same parsing and enrichment logic used for live data, which helps engineers confirm that the issue was caused by a misconfigured NAT rule.

Rehydrate archived Palo Alto Firewall Traffic logs with Observability Pipelines.
Rehydrate archived Palo Alto Firewall Traffic logs with Observability Pipelines.

Compliance and configuration audits with CloudTrail

A DevOps team relies on AWS CloudTrail archives to maintain a full record of infrastructure changes. Routine Get* and Describe* events are filtered from live pipelines, but all events are stored in Amazon S3 and readily retrievable via rehydration. When a quarterly compliance review requires evidence of IAM policy changes, the team rehydrates only the relevant Update* and Attach* events from the past quarter. Observability Pipelines processes and routes these logs to their SIEM for review and omits all other logs from the given time frame. Whether it’s used for performance investigations or compliance validation, rehydration lets teams retrieve the specific data they need from cold storage without the need for scripts, decompression, or manual file handling.

Rehydrate AWS CloudTrail events with Observability Pipelines.
Rehydrate AWS CloudTrail events with Observability Pipelines.

Apply parsing, enrichment, and routing logic to log rehydrations

As illustrated in the above examples, once you’ve selected your archived data and time frame, Observability Pipelines handles the rest. Rehydration automatically streams logs from storage through a pipeline. This allows you to apply Observability Pipeline’s processors to parse, enrich, and route rehydrated logs as with any other ingested live data. Based on the previous examples, your engineering teams can process archived logs as follows:

  • Security teams can apply field extractions and tagging from the Okta Pack to enrich rehydrated authentication logs with user and geo-IP context before sending them to their investigation workspace.
  • Network engineers can run rehydrated Palo Alto Firewall TRAFFIC logs through the parsing and enrichment Pack rules used for live data, ensuring that historical logs share the same schema and context as real-time logs.
  • DevOps teams can validate CloudTrail policy-change events by routing rehydrated logs through their enrichment processors before pushing them into their observability tool.

Try Rehydration in Preview

Rehydration for Observability Pipelines is now available in Preview. Whether you’re responding to an incident or performing a configuration audit, Rehydration gives you on-demand access to archived logs with the same consistency and control as live data.

To learn more, visit the Observability Pipelines product page. If you’re not a Datadog customer, to experience how Datadog helps you manage log retention and reprocessing.

Related Articles

Troubleshoot and resolve Kubernetes issues with AI-powered guided remediation

Troubleshoot and resolve Kubernetes issues with AI-powered guided remediation

Use OpenTelemetry with Observability Pipelines for vendor-neutral log collection and cost control

Use OpenTelemetry with Observability Pipelines for vendor-neutral log collection and cost control

Control logging costs on any SIEM or data lake using Packs with Observability Pipelines

Control logging costs on any SIEM or data lake using Packs with Observability Pipelines

Key learnings from the State of Containers and Serverless report

Key learnings from the State of Containers and Serverless report

Start monitoring your metrics in minutes