---
title: "Monitor Apache Hive with Datadog"
description: "Use Datadog's Hive integration to get visibility into your big data queries."
author: "Paul Gottschling"
date: 2019-07-29
tags: ["database monitoring", "log management", "apache", "hive", "data analytics", "sql"]
blog_type_id: the-monitor
locale: en
---

[Apache Hive](https://hive.apache.org/) is an open source interface that allows users to query and analyze distributed datasets using SQL commands. Hive [compiles SQL commands](https://cwiki.apache.org/confluence/display/Hive/Design) into an execution plan, which it then runs against your Hadoop deployment. You can customize Hive by using a number of [pluggable components](https://cwiki.apache.org/confluence/display/HIVE#Home-ApacheHive) (e.g., HDFS and HBase for storage, Spark and MapReduce for execution). With our new integration, you can monitor Hive metrics and logs in context with the rest of your big data infrastructure.

![oob-dash](https://web-assets.dd-static.net/42588/1776356050-hive-oob-dash.png)

## Optimize Hive memory usage 

The more clients you expect to be using Hive at once, the [more heap memory](https://www.cloudera.com/documentation/enterprise/5-9-x/topics/admin_hive_tuning.html#id_umf_bmk_nw) you will need to allocate to ensure proper performance. Datadog's [out-of-the-box dashboard](https://app.datadoghq.com/screen/integration/30279/hive-integration-dashboard) allows you to track client sessions alongside memory usage from two Hive components:

- [**HiveServer2**](https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Overview), which processes client connections using an RPC framework and HTTP server
- the [**Metastore**](https://cwiki.apache.org/confluence/display/Hive/Design#Design-Metastore), which stores information about the structure of your Hadoop data  for use in executing and compiling queries

#### Monitor Apache Hive and the rest of your big data infrastructure with Datadog.
Get started

You can use the out-of-the-box dashboard to determine when HiveServer2 and the Metastore are nearing their maximum heap size. You can then clone and customize the dashboard to see how many concurrent sessions correspond with high memory usage, and understand when demand is likely to be high.

![A custom dashboard compares HiverServer2 open client sessions to memory metrics. In the bottom graph, the blue line indicates the maximum total memory, purple indicates the total used memory, and yellow the memory use at initialization.](https://web-assets.dd-static.net/42588/1776356058-hive-sessions-mem.png)
*A custom dashboard compares HiverServer2 open client sessions to memory metrics. In the bottom graph, the blue line indicates the maximum total memory, purple indicates the total used memory, and yellow the memory use at initialization.*

## Troubleshoot slow queries

SQL operations in Hive go through a [series of states](https://svn.apache.org/repos/infra/websites/production/hive/content/javadocs/r2.1.1/api/org/apache/hive/service/cli/OperationState.html) before they return results to the user, such as `INITIALIZED`, `PENDING`, and `RUNNING`. Once these operations reach the Hive [Driver](https://cwiki.apache.org/confluence/display/Hive/Design#Design-HiveArchitecture), Hive tracks their progress through another set of phases: submission, compilation, and execution. With Datadog's integration, you can track the time your SQL operations spend in different states, allowing you to identify bottlenecks and optimize performance.

![query-breakdown](https://web-assets.dd-static.net/42588/1776356063-hive-query-breakdown.png)

## Investigate execution errors in context

If your Hive queries fail to execute, it's important to get context from your logs to help you troubleshoot. Datadog's integration includes a log processing pipeline that makes it straightforward to troubleshoot Hive errors. The integration automatically parses your Hive logs for key information like the database operation and user, allowing you to find commonalities and discover erroneous commands. And for unhandled exceptions, Datadog's log parser can also capture stack traces, making it easier to pinpoint the causes of errors (e.g., in the situation below, an internal exception thrown by the Metastore).

![logs](https://web-assets.dd-static.net/42588/1776356072-hive-logs.png)

You can use Datadog to identify issues with a particular phase of query completion, and then navigate to correlated logs to investigate possible root causes. For example, if the out-of-the-box dashboard shows an increase in `PENDING` SQL operations but not in `RUNNING` ones (or `RUNNING` operations have dropped off), there might be errors in the `PENDING` phase. You can click the graph to consult logs from when `RUNNING` operations declined, and see if (for example) there's been a [`HiveSQLException`](https://svn.apache.org/repos/infra/websites/production/hive/content/javadocs/r2.1.1/api/org/apache/hive/service/cli/HiveSQLException.html).

![pending-running-ops](https://web-assets.dd-static.net/42588/1776356079-hive-pending-running-ops.png)

## Dogs, bees, and elephants—oh my! 

Datadog's Hive integration gives you even more visibility than before across your distributed big data architecture, including [HDFS](https://docs.datadoghq.com/integrations/hdfs.md), [YARN](https://docs.datadoghq.com/integrations/yarn.md), and [MapReduce](https://docs.datadoghq.com/integrations/mapreduce.md), as well as technologies that might be running alongside Hadoop, such as [AWS Elastic MapReduce](https://docs.datadoghq.com/integrations/amazon_emr.md) and [ZooKeeper](https://docs.datadoghq.com/integrations/zk.md)—all told, Datadog supports 1,000 integrations and counting. You can try out Datadog for yourself with a <!-- Sign-up trigger (free trial) omitted -->.