Gain Visibility Into Your Cloudera Clusters With Datadog | Datadog

Gain visibility into your Cloudera clusters with Datadog

Author Candace Shamieh

Published: May 26, 2023

Cloudera Data Platform (CDP) is a data analytics and management platform that enables users to centralize, visualize, and govern their data. While users may be accustomed to data analytics solutions that are completely siloed and difficult to scale, CDP is designed to be flexible, giving customers the ability to integrate with open source technologies and deploy in a hybrid, cloud-native, or multi-cloud environment.

If you’re using CDP Public Cloud, then you now have the ability to use Datadog’s Cloudera integration. Our integration provides you with complete visibility into your CDP Public Cloud clusters, helping SREs and developers ensure that their workloads are running smoothly and at optimal performance. Collecting and viewing your CDP metrics and logs with Datadog will equip you with the right information to prevent, identify, and troubleshoot issues before they impact your end users.

Our integration has been successfully tested and certified to work with CDP, making Datadog a Cloudera Certified Technology Partner. In this post, we will discuss how to:

Understand the health of your CDP Public Cloud clusters and hosts to optimize capacity planning

Once you install the integration, the Datadog Agent will automatically collect information from all of your CDP Public Cloud clusters. The Agent is configurable so that you can include or exclude specific clusters based on parameters that you define.

Within the Datadog platform, the OOTB Cloudera Overview dashboard will enable you to view and proactively monitor the performance and health of your CDP Public Cloud clusters and hosts. In the Overview section, the dashboard displays the status of the service checks for connectivity, cluster health, and host health. Datadog will verify that the Agent can connect to your Cloudera Manager API and alert you with a security event if there’s an issue, so that you can take action to remediate quickly. The Overview section also shows a detailed picture of your host health to inform you when health is good, concerning, bad, disabled, or unknown.

Cloudera Data Platform Overview dashboard

At a glance, our Cloudera Overview dashboard also displays CPU usage across all host entities within the entire cluster. You can review the disk read and write bytes within your clusters, and how your cluster is performing at a network level. You can set up custom alerts based on these metrics or others, such as memory usage, in order to be notified of any issues before they become critical.

By collecting and analyzing performance metrics from CDP, you can gain detailed insight into the resource usage of your clusters. This information can be used for capacity planning and optimization, such as identifying underutilized or overutilized hosts and optimizing resource allocation to improve overall cluster performance. Optimization can help you prevent issues at the source, ensuring that your clusters can handle the demands of your workloads.

View your entire Hadoop stack with OOTB Cloudera Powerpacks

If you use Cloudera Distributed Hadoop (CDH)—Cloudera’s fully open source platform and widely adopted Apache Hadoop distribution—Datadog Powerpacks will help you visualize and analyze your entire Hadoop stack. Along with our dashboard, Cloudera Powerpacks conveniently provide you with full visibility into the Hadoop services that are running on your CDP Public Cloud clusters. You can add our OOTB Powerpacks to your dashboard and customize them as needed to help you detect and remediate performance issues early.

We have created two Cloudera Powerpacks for common CDH use cases: data engineering and operational database. With pre-selected key metrics that are grouped into relevant dashboard widgets, you can efficiently view and understand the health and performance of each service in your Hadoop stack. The Cloudera Data Engineering Powerpack is designed for the services in your data engineering clusters, while the Cloudera Operational Database Powerpack is designed for the services in your operational database clusters.

Cloudera Data Engineering Powerpack with Zookeeper and HDFS

Once you install each of the individual integrations used in your stack, Datadog will begin collecting activity from your respective Hadoop core components, like HDFS Datanode, HDFS Namenode, and YARN. Depending on your use case, you will see the core components in your Powerpack alongside your data engineering or operational database services, such as Apache ZooKeeper and Spark.

The visibility that Cloudera Powerpacks provide can help you quickly identify and prevent the root cause of issues that otherwise would be extremely difficult to pinpoint. For example, if your data engineering workloads on CDP are containerized and in completely isolated environments, having the ability to view the most relevant metrics in one place across all your Spark jobs will enable you to discover performance bottlenecks and initiate a plan to remediate.

Cloudera Data Engineering Powerpack with Yarn

Start monitoring your Cloudera clusters today

Datadog’s Cloudera integration provides real-time visibility into your CDP Public Cloud clusters, so that you can ensure they’re available, appropriately provisioned, and able to support your workloads efficiently. For more information on how to get started with our Cloudera integration, check out our documentation. Or, if you’re new to Datadog, get started with a 14-day .