Optimize Kubernetes Workload Resourcing With StormForge and Datadog | Datadog

Optimize Kubernetes workload resourcing with StormForge and Datadog

Author Jesse Mack

Published: 2月 28, 2023

StormForge Optimize Live is a machine learning-powered performance and resource optimization solution for Kubernetes workloads. Optimize Live ingests and analyzes production observability data and recommends specific actions to optimize CPU and memory utilization. You can take these actions manually or set them to occur automatically, making it easier to maintain a high level of application performance while minimizing cloud costs.

We’re excited to announce that you can now purchase StormForge’s software license in the Datadog Marketplace. By using this license with StormForge’s integration, you can access an out-of-the-box dashboard in Datadog that displays your key Kubernetes performance metrics, alongside recommended actions to optimize resource utilization.

Adjust cloud resourcing based on performance metrics

Once you enable the integration, StormForge Optimize Live will begin analyzing the telemetry from your Kubernetes environment that you’re collecting with Datadog and identify optimization opportunities for your deployments. These insights will populate your StormForge dashboard in Datadog, where you’ll be able to see key CPU and memory metrics—including total utilization, requests, and limits—across different objects within your environment. Tracking these metrics enables you to quickly pinpoint the most CPU- and memory-intensive deployments.

Each graph on the StormForge dashboard in Datadog also visualizes Optimize Live’s recommended resource configuration against current levels, helping you understand how to recalibrate resourcing to avoid both over- and under-provisioning.

For example, if actual resource utilization across your Kubernetes deployments is close to or exceeds configured limits, it can mean you haven’t allocated enough resources to them and could lead to performance problems. In this case, Optimize Live will recommend that you increase requested CPU or memory—you can quickly identify this by noting, for example, if the “Impact on CPU Requests” cell is red and displays a positive number, as below.

stormforge-image-1.png

From here, you can open up Optimize Live to see the exact CPU and memory levels StormForge is recommending for each container involved in the service you’re monitoring, and accept these recommendations directly through the platform.

stormforge-image-2.png

By following these recommendations, you can adjust resource allocations to be in line with your applications’ actual usage. This frees up these otherwise wasted resources for other applications and helps optimize cloud spend.

Autoscale pods vertically and horizontally

One common method of managing resource usage across Kubernetes pods is to use the Kubernetes HorizontalPodAutoscaler (HPA) or VerticalPodAutoscaler(VPA) to adjust resourcing to your workloads as utilization increases or decreases. Using both the HPA and VPA to scale CPU and memory, however, results in thrashing—a scenario in which the HPA adds replicas to the target workload in order to reduce per-pod resource utilization, but the VPA then reduces CPU and memory allocations because it sees utilization has decreased and determines that the additional resources are no longer needed. This cycle continues, and neither the HPA nor the VPA can achieve the goal of making your Kubernetes environment run more efficiently.

To solve this problem, you can configure StormForge Optimize Live to automatically implement recommended resourcing levels across your pods, rather than requiring manual approval. In this setup, StormForge acts as a VPA, keeping pods right-sized as usage varies, and will automatically detect whether an HPA is running for a particular workload.

If this is the case, StormForge will then recommend the best target utilization for the HPA to scale on, as well as the optimal resource settings at the pod level. The screenshot below shows the StormForge dashboard in Datadog displaying current settings for two containers, with an average HPA target utilization of 40.8 percent. On the right, StormForge is recommending a new HPA target utilization of 94.9 percent, along with new CPU and memory settings.

stormforge-image-3.png

If you’ve enabled Optimize Live to automatically update your pods with the recommended CPU and memory settings, your HPA will work in concert with the vertical autoscaling provided by StormForge, helping you maximize efficiency in both ways.

Start right-sizing Kubernetes resources

Datadog allows you to track and monitor performance metrics across your entire stack, including your Kubernetes clusters. With the StormForge offering, you can translate the observability data you’re already gathering in Datadog into specific, actionable recommendations that will optimize CPU and memory resourcing for your Kubernetes workloads. This helps you avoid performance issues, improve cloud cost-efficiency, and reduce the time and effort it takes to course-correct from over- or under-provisioning.

You can get started by signing up for a StormForge subscription through the Datadog Marketplace and installing the StormForge integration. If you’re not already a Datadog customer, sign up today with a free 14-day trial​​.

The ability to promote branded monitoring tools is a membership benefit offered through the Datadog Partner Network. Learn more about the Datadog Marketplace in this blog post. If you’re interested in developing an integration or application that you’d like to promote, you can contact us at marketplace@datadog.com.