Datadog Data Jobs Monitoring | Datadog

Data Jobs Monitoring

Observe, troubleshoot, and cost-optimize your Spark and Databricks jobs across data pipelines.

dg/djmheader2

1,000+ Turn-Key Integrations, Including

Product Benefits

Detect Issues Anywhere in Your Data Pipelines

  • Immediately receive alerts when jobs have failed or are still running beyond the expected completion time with out-of-the-box alerts
  • Visually identify trends and anomalies in job performance to quickly analyze your data platform’s reliability and estimated costs
  • Prioritize job issue resolution more efficiently by using recommended filters to surface important issues, such as failures, latency, cost spikes, and more
dg/djm1.png

Resolve Failed and Long-Running Jobs Faster

  • Get full context to troubleshoot faster by drilling down to see the full execution flow (i.e., job, stages, and tasks) and where it failed
  • Uncover root cause of slow jobs by identifying inefficient Spark stages or SQL queries that could be impacted by data skew, disk spill, or other common factors
  • Expedite root cause analysis by comparing recent runs of a job to spot trends in run duration, Spark performance metrics, cluster utilization, and configuration
dg/djm2.png

Reduce Costs by Optimizing Jobs

  • Lower compute costs by identifying overprovisioned clusters and changing the number of worker nodes and instance types
  • Increase job run efficiency at the application level by using Spark execution metrics to determine improvements in the code or configuration
  • Surface the largest savings opportunities by viewing the idle compute for the largest jobs and cluster utilization over time
dg/djm3.png

Centralize Data Pipeline Visibility with the Rest of Your Infrastructure

  • Gain complete data pipeline visibility in a unified dashboard—view data storage, warehouse, and orchestrator metrics in the same place as your job telemetry
  • Quickly understand what influenced job failures or latency spikes, such as infrastructure metrics, Spark metrics, logs, configuration by pivoting seamlessly between key data pipeline metrics
  • Accelerate incident response and debugging with flexible tagging that route alerts for data pipeline issues to the right team
dg/databricksclusterdashboard.png

The Essential Monitoring and Security Platform for the Cloud Age

Datadog brings together end-to-end traces, metrics, and logs to make your applications, infrastructure, and third-party services entirely observable.

Platform Diagram

Loved & Trusted by Thousands

Washington Post logo 21st Century Fox Home Entertainment logo Peloton logo Samsung logo Comcast logo Nginx logo