Many organizations manage applications that are supported by a large number of services in multiple environments, ranging from the cloud to their own data centers across the globe. As these organizations scale and accelerate service adoption, the volume of telemetry data in their environments multiplies every year. Consequently, teams are tasked with managing and routing large volumes of metrics, traces, and logs from a wide variety of sources to their appropriate but often siloed destinations, such as log management tools, archives, or SIEM solutions. This complexity not only risks exposing sensitive data but also leads to vendor lock-in, poor data quality, and an increase in overall management costs.
Datadog Observability Pipelines addresses these problems by giving you more flexibility and control over your data. Pipelines are built on an open source project that enterprises already rely on to manage petabytes of telemetry data every month. Now you can leverage the same highly scalable platform for collecting, transforming, and routing data in your own environment, regardless of its volume, source, or destination.
In this post, we’ll look at how Observability Pipelines helps you improve data visibility by:
- Controlling the costs and volume of data as you scale your application
- Decoupling data sources from their destination to simplify migrations
- Standardizing and improving data quality through organization-wide schemas
- Redacting sensitive data in your environment to maintain compliance
Modern environments can generate terabytes—even petabytes—of observability data per day as organizations continue to grow and adopt new technologies. This volume can make it more difficult to manage network, ingestion, and indexing fees, creating difficult tradeoffs between cost and visibility. On top of that, the tools you use to ingest, process, and route data can also be expensive and complicated to operate.
Datadog Observability Pipelines helps you make value-based decisions on all of your observability data before it leaves your environment, allowing you to manage operational costs while maintaining visibility. For example, you can reduce your data’s signal-to-noise ratio by automatically re-routing all INFO-level logs to low-cost storage or converting critical logs to metrics. You can also drop unnecessary fields in specific payloads before sending them downstream to third-party logging platforms.
Each pipeline you create can be monitored directly on the Datadog platform, allowing you to visualize every processing step and get better insights into performance or bottlenecks.
For example, tracking the average load for an Amazon S3 bucket destination can help you determine if you are using the correct storage class—you may be able to use a cheaper storage class for infrequently accessed data. These measures ensure that resources are provisioned to process and route business-critical information adequately, even if they experience a surge of traffic.
Organizations face difficult decisions when searching for new technologies that can support their observability goals. Making the wrong choice can result in expensive, multi-year migration projects, only to find that they need to start the search process all over again.
Datadog Observability Pipelines allows you to collect all of your observability data, process it in your own environment, and route it to any technology or platform of your choice. By decoupling data sources from their destination, you can significantly reduce the risks associated with adopting new solutions. For example, you can create a pipeline that temporarily routes all logs to a new log management platform so you can evaluate the new system at your own pace.
Decoupling sources from their destinations is especially useful during migrations from legacy systems, which can incorporate multiple platforms, geographical regions, and resources to support your applications. You can use Observability Pipelines to create dedicated workflows for on-premise, hybrid, or fully-cloud hosted deployments, allowing you to track and manage every stage of your migration.
Datadog will automatically highlight bottlenecks or errors in a particular pipeline so you can determine which newly deployed hosts are not able to process incoming data optimally.
In addition to teams generating their own observability data, each platform, tool, and service in an environment may structure information differently or lack valuable context for an event. This inconsistency makes it more difficult for teams to identify the root cause of an issue during an investigation, which can increase their time to detection and resolution. Taking a piecemeal approach to data management can be difficult to maintain at scale and limits a team’s ability to own and manage their own service data as needed.
Datadog Observability Pipelines includes out-of-the-box integrations that automatically parse, format, and enrich data in your own environment, enabling you to maintain data consistency across all your sources. For example, you can use Vector Remap Language to flexibly manipulate your data into any shape or format and enforce a data schema across all logs. You can also use the GeoIP transform to add important context to logs at their source before routing them to any destination.
On top of complying with data residency requirements, the risk of exposing or losing valuable business data grows as your organization scales. The loss of data is often the result of using pipelines that are not resilient to outages. Leaking sensitive data, on the other hand, can occur due to a lack of visibility into what your services log across your infrastructure.
Pipelines enable you to create specific deployments for each of your data centers and scrub any sensitive data (e.g., social security or credit card numbers) in your environment, allowing you to meet data residency requirements and mitigate other compliance risks. With the redact transform, for example, you can write custom rules to scan and redact any sensitive data in your logs before they are routed to various destinations.
Datadog Observability Pipelines is powered by an active and community-supported project that thousands of companies rely on to manage telemetry data in their production environments. Now you can leverage the same capabilities to build pipelines that enable you to manage data costs, safely adopt new technologies, improve compliance, and govern data quality under Datadog’s unified platform. Check out our documentation to get started. Or, if you don’t already have a Datadog account, you can sign up for a free 14-day trial.