Hadoop & Spark Monitoring With Datadog

Hadoop & Spark monitoring with Datadog

Apache Hadoop began as an open source implementation of Google’s MapReduce algorithm and its supporting infrastructure. Over time Hadoop evolved to accommodate different execution engines in addition to MapReduce, such as Spark. Today Hadoop is the standard data processing platform for large data sets, and is used by very nearly all companies with large amounts of data.

Hadoop is great, but sometimes opaque

Hadoop is incredibly widely used because it does a good job solving an important and complex problem: distributed data processing. But it is a complex technology, and it can be quite difficult to know what’s happening, why jobs fail, if issues in Hadoop are related to the data, upstream / downstream issues in other parts of your stack, or Hadoop itself. As with most distributed systems running on many machines, it can be quite hard to collaboratively solve problems.

Datadog + Hadoop

So we decided to add the power of Datadog to Hadoop. After you turn on the integration, you can see hundreds of Hadoop metrics alongside your hosts’ system-level metrics, and can correlate what’s happening in Hadoop with what’s happening throughout your stack. You can set alerts when critical jobs don’t finish on time, on outliers or on pretty much any problematic Hadoop scenario of which you want to be made aware.

HDFS

Using the HDFS integration you can monitor:

the number of data nodes: live, dead, stale, decommissioning
number of blocks: under-replicated, pending replication, pending deletion
disk space remaining on each host, and on the cluster as a whole
namenode load and lock queue length
and more

MapReduce

The MapReduce integration includes metrics for:

map and reduce jobs pending, succeeded, and failed
map and reduce input / output records
bytes read by job or in total
any counter
and more

YARN

The YARN integration gives visibility to:

nodes: active, lost, unhealthy, rebooted, total
apps: submitted, pending, running, completed, failed, killed
cluster cores: allocated, available, reserved, total
cluster memory: in use, total available
and more

Spark

With the Spark integration you can see:

driver and executor: RDD blocks, memory used, disk used, duration, etc.
RDDs: partitions, memory used, disk used
tasks: active, skipped, failed, total
job stages: active, completed, skipped, failed
and more

Turn it on

If you already use Datadog and Hadoop, you can turn on the integrations right now, and add Hadoop to the long list of technologies you can monitor easily and collaboratively. If you’re new to Datadog, here’s the link to a free trial.

Want to work with us? We're hiring!

Hadoop & Spark monitoring with Datadog

Further Reading

Hadoop is great, but sometimes opaque

Datadog + Hadoop

HDFS

MapReduce

YARN

Spark

Turn it on

Further Reading

Start monitoring your metrics in minutes

Hadoop & Spark monitoring with Datadog

Further Reading

Related jobs at Datadog

Further Reading

Monitor your AWS ECS platform with Convox and Datadog

NGINX 502 Bad Gateway: Gunicorn

NGINX 502 Bad Gateway: PHP-FPM

3 clear trends in ECS adoption