Companies that successfully run large distributed systems commonly rely on Kubernetes to automatically manage scaling and application deployments within their containerized environments. Kubernetes makes deployments more agile and can help reduce operating costs, but it also adds layers of complexity. Teams need to oversee a dynamic mix of virtual machines, containers, and applications. This complexity can make performance issues hard to predict and diagnose. Ensuring the performance and reliability of such an intricate system is crucial, but it can also pose a significant operational cost if teams don't have comprehensive visibility into their clusters.
Datadog offers total visibility into the health and performance of a Kubernetes environment. Managers, developers, and operations teams can use the Datadog Service Map to visualize and understand the architecture of their clusters. They can then get actionable insights with machine learning-driven analysis tools like forecasting and anomaly detection. Customizable dashboards help teams visualize Kubernetes data alongside metrics from Istio, Vault, and 400+ integrations. And since Datadog is designed to collect data efficiently from large-scale clusters, teams can easily keep tabs on their Kubernetes environments—whether they're running tens or thousands of nodes—and even extend Kubernetes to autoscale based on any Datadog metric.
Organizations use Kubernetes' built-in autoscaling feature to prepare their services for surges in user traffic and optimize costs during quieter periods. Out-of-the-box Kubernetes autoscaling can be used to scale on simple metrics, but these metrics (which reflect the workload itself) may not reflect external conditions that are also important to the business. Datadog extends Kubernetes to autoscale based on the real-time values of any of your custom metrics. Teams can fine-tune their autoscaling policies using key performance indicators like unique pageviews or completed purchases. This allows teams to choose the most precise measure of demand as measured by a number of systems and metrics outside of Kubernetes, and run just enough infrastructure to deliver the best experience for their users at the lowest cost.
As teams focus on building out new features and services, Kubernetes greatly reduces time to market by automating many aspects of the deployment process. But because Kubernetes can launch an application on any host in a cluster, it can be challenging for teams to ensure successful releases. Datadog automatically detects services running in Kubernetes clusters and monitors them no matter where they spin up. On top of that, Datadog's Watchdog feature uses machine learning algorithms to automatically root out unusual trends within infrastructure and application metrics. As a result, engineers can monitor new services without spending valuable time on manually configuring checks or anticipating failure scenarios. With Datadog, teams can monitor every step of a feature rollout and quickly fix bugs to ensure a seamless customer experience.
Kubernetes clusters run on an increasingly diverse range of platforms. Some businesses opt for a fully managed platform, while others self-host on Rancher, OpenShift, or Anthos. Datadog can provide comprehensive visibility into any Kubernetes environment, along with all of the applications running on it. Teams can visualize data from 400+ integrations—including all major cloud providers—to track Kubernetes health and performance regardless of the underlying platform. Datadog automatically enriches data with tags from cloud providers and labels from Kubernetes itself. Teams can quickly determine where in their clusters a problem exists—and reduce mean time to resolution—by scoping an issue to a specific container image, pod name, or region.