State of Containers and Serverless | Datadog
State of Containers and Serverless

/ / / / / / / / / / / / / / / / / / / / / /

Cloud providers offer an increasing breadth of modern compute services—from virtual machines to container orchestration platforms, serverless offerings, and specialized hardware for high-performance tasks. These options enable organizations to deploy each workload on the technology best suited to optimize performance, cost, and operational simplicity.

For this report, we analyzed cloud usage data from tens of thousands of Datadog customers. We examined adoption trends across compute options, autoscaling practices, and approaches to efficiency and optimization. Our findings suggest that most organizations rely on an evolving mix of modern compute technologies, shaped by growing focus on cost optimization, shifting preference of autoscaling tools, and expanding Arm usage.


事実 1

GPU adoption has increased to support growing AI and data-intensive workloads

As AI workloads such as training, inference, and data processing become more common, organizations are beginning to adopt GPU-powered instances to efficiently support those activities. Our data shows a steady rise in the uptake of GPUs over the last 2 years.

A line chart showing GPU adoption rising from about 4.5% to just over 6% of organizations between October 2023 and October 2025, reflecting steady growth in usage for AI and data workloads.

Adoption of GPUs is still limited to a small number of organizations running large-scale AI, data processing, and research computing workloads. But those early adopters are driving rapid growth, consuming three times as many instance hours as they did 2 years ago. They still account for less than 3% of the instance hours consumed by CPUs and other traditional compute, but use of GPUs is growing at a much faster rate.

A line chart comparing growth in instance hours for GPUs versus CPUs and traditional instances. GPU usage grows roughly threefold over two years, outpacing CPUs and traditional compute.

We expect GPU adoption to expand with continued growth of AI and drive further increases in instance hours consumed. Early in this adoption curve, we’re seeing inference servers gain traction, with Triton, vLLM, and Ollama leading the way. However, the pace of future growth will depend on several factors, including chip supply, advances in workload efficiency, energy availability, and the ability of organizations to manage rising infrastructure costs.

事実 2

AI joins the most popular workload categories for containers

Building on previous research into the most popular categories of containerized workloads, we saw some familiar categories this year as well as some new ones. The databases category remains in front, with Redis again the most widely used and its fork, Valkey, rising fast. As a category, web and app servers are nearly as prevalent as databases.

A horizontal bar chart comparing containerized workload categories. Databases lead at 45%, followed by web and app servers (42%) and CI/CD tools (27%). AI workloads account for 7%.

We also saw entrants in a new workflows category, led by Airflow. Additionally, AI has emerged as a notable new workload category, though it remains far less common than the long-standing leaders. This category is led by NVIDIA Data Center GPU Manager (DCGM) and includes inference servers such as vLLM and vector databases, including Qdrant. AI workloads still represent a small share compared to traditional compute and application use cases, but early GPU adoption suggests that they’re here to stay and may soon displace some of today’s top categories.

事実 3

Most workloads use less than half of their requested resources

Across Azure Container Apps, Google Cloud Run, Amazon ECS Fargate, AWS Lambda, and Kubernetes, most workloads use less than half of their requested memory and less than 25% of their requested CPU. Historically, we’ve observed a pattern of underutilization in Kubernetes environments, and this data shows that it extends to other environments as well.

A bar chart showing memory utilization across platforms including Azure Container Apps, Google Cloud Run, ECS Fargate, AWS Lambda, and Kubernetes. The majority of workloads use under 50% of their allocated memory.
A bar chart comparing CPU utilization by platform. Most workloads consume less than 25% of their requested CPU, showing widespread underutilization across cloud compute services.

Several factors likely contribute. Many teams intentionally overprovision resources to prioritize stability and manage the risk of throttling or latency during traffic spikes. Platform design can also play a role. For example, in Lambda, CPU scales proportionally with memory, so sizing for one resource may leave the other mostly idle. Similarly, in Fargate, developers must choose from predefined bundles of CPU and memory that may not match workload needs.

Underutilized resources will always be present as organizations continue to fine-tune new services and build in headroom to ensure performance. The ideal amount of unused capacity varies by application, but the data shows that organizations have ample room to lower costs through more efficient use of cloud resources. Teams should monitor high-cost workloads and experiment with different compute models, shifting containerized services to serverless functions or the other way around to improve efficiency. When launching new services, teams should use profiling to reveal how workloads actually consume resources, and then apply that knowledge to make more accurate initial allocations.

事実 4

Nearly two-thirds of Kubernetes organizations use HPA

Continuing a trend we noted in 2021 and again in 2023, a growing share of Kubernetes organizations—now over 64%—has adopted Horizontal Pod Autoscaler (HPA). HPA enables administrators to improve the cost and resource efficiency of their clusters by automatically resizing their deployments to meet demand without overprovisioning. The sustained increase in its adoption shows that HPA’s benefits—simplified cluster management and automatic cost optimization—are well established and widely accepted.

A line chart showing Kubernetes Horizontal Pod Autoscaler (HPA) adoption rising from about 55% to over 64% between Q1 2024 and Q3 2025, demonstrating continued growth in horizontal scaling adoption.

We’ve also found that despite HPA’s pervasive adoption, cluster infrastructure is often overprovisioned, leading to idle container resources and wasted cloud spend.

Our data shows not just increasingly broad HPA adoption, but also deep use within organizations. Eighty-six percent of HPA users apply it in most of their clusters, and nearly half use it in every cluster. This suggests that HPA is a foundational part of Kubernetes for most users, not a feature they rely on for only a narrow range of workloads.

A stacked bar chart showing HPA usage depth among Kubernetes organizations. 47% use HPA in all clusters, 39% in most clusters, and 14% in less than half.

While horizontal scaling adoption continues to rise, many clusters that run workloads with volatile traffic don’t yet take advantage of it. We found that 46% of unscaled workloads experienced multiple significant CPU spikes per day, suggesting that they would be good candidates for horizontal scaling. We anticipate that horizontal scaling adoption will continue to grow as organizations seek to handle spikes like these more efficiently and cost-effectively.

事実 5

Only 20% of HPA-enabled deployments use custom metrics for autoscaling Kubernetes

Of the deployments in our dataset that use HPA, only 20% use custom metrics to scale based on application characteristics such as queue depth or request rate. Four out of five deployments scale based on changes in CPU or memory utilization instead of custom metrics.

A Venn diagram showing the share of deployments using different scaling metrics. 80% use CPU or memory metrics, 14% use custom metrics, and 6% use both custom and standard metrics.

CPU and memory usually provide the most relevant signals for when to add or remove capacity, and most applications scale effectively based on these metrics alone. But for workloads that are not CPU- or memory-bound, scaling based on custom metrics instead is often advantageous. By providing indicators of a workload’s actual behavior, custom metrics enable cluster administrators to configure more precise autoscaling to achieve better performance and cost efficiency.

事実 6

Karpenter adoption overtakes Cluster Autoscaler

Karpenter has replaced Cluster Autoscaler as the leading tool for autoscaling in Kubernetes. Our research shows that the percentage of nodes provisioned by Karpenter rose by 22% in the last 2 years, while nodes provisioned by the Kubernetes Cluster Autoscaler declined by 17%. The share of nodes using either tool has held steady, and benefits like cost optimization and simplified administration likely contribute to autoscaling’s continued importance.

A line chart showing Karpenter’s share of Kubernetes node provisioning rising from about 11% to roughly 34% between October 2023 and October 2025, surpassing Cluster Autoscaler, whose share falls from around 40% to 25%.

The shifting adoption data suggests that organizations recognize Karpenter’s advantages, including greater flexibility in choosing instance types compared to Cluster Autoscaler. Karpenter was originally developed by AWS, but its presence is growing in Azure and other clouds, giving momentum to its increasing adoption.

事実 7

Most cloud customers use one or more serverless offerings

Most customers in AWS, Google Cloud, and Azure use at least one serverless compute service. Lambda is AWS’s most popular serverless offering, used by 65% of AWS customers, and Cloud Run shows similar adoption on Google Cloud at 70%. On Azure, 56% of customers use App Service—more than any other serverless compute offering.

A grouped bar chart comparing usage of serverless services across AWS, Azure, and Google Cloud. AWS Lambda leads at 65%, Google Cloud Run at 70%, and Azure App Service at 56%, showing broad adoption across clouds.

Taken together, this data suggests that serverless is essential for most customers but not tied to a single dominant use case. The leading services excel across different workload types—Lambda for event-driven functions, Cloud Run for containerized services, and App Service for always-on apps. Lacking a single use case, we attribute serverless’s high adoption to its broad advantages—fast and transparent scaling, per-invocation pricing, and operational simplicity.

“Datadog's 2025 State of Containers and Serverless report highlights that serverless has become fundamental to how developers build modern applications in the cloud, driven by the automatic scaling, cost efficiency, and agility offered by services like AWS Lambda. We're excited to see organizations benefit from the serverless operational model as we continue to enhance the developer experience through innovations that make it easier to build emerging or increasingly complex workloads and architectural patterns.”

Shridhar Pandey
Principal PM for AWS Serverless Compute
事実 8

Jobs and CronJobs run short, but Deployments and StatefulSets are long-lived

Most Kubernetes containers are short-lived: Almost two-thirds have an uptime under 10 minutes, and about one-third finish running in under 1 minute. Most short-lived containers are associated with Jobs, CronJobs, and standalone Pods. By contrast, containers running more than 10 minutes are most often part of a Deployment, DaemonSet, or StatefulSet, suggesting that those Kubernetes objects manage primarily long-lived containers.

A bar chart showing Kubernetes container uptime by workload type. Around two-thirds of containers run under 10 minutes—mostly Jobs and CronJobs—while long-lived containers are primarily managed by Deployments and StatefulSets.

Short-lived containers are useful for scheduled workflows and one-off operational or development tasks. But their prevalence exposes a gap: Kubernetes’ core autoscaling components can’t effectively scale these workloads. Jobs and CronJobs typically complete before the HPA can adjust replica counts or the Vertical Pod Autoscaler (VPA) can replace containers with new, right-sized ones.

Lacking these autoscaling capabilities, organizations can more efficiently allocate resources for Jobs and CronJobs by:

  • Profiling them to determine initial resource needs, then manually applying VPA recommendations to right-size them based on their resource usage history across runs
  • Using the Job specification’s parallelism and completions fields to provide limited horizontal scaling
  • Using Kubernetes Event-Driven Autoscaler (KEDA) for event-driven horizontal autoscaling based on performance metrics like queue length or pipeline latency

Organizations may also choose to deliberately overprovision resources for these workloads and even isolate them in dedicated clusters. But these approaches sacrifice efficiency in favor of performance and availability. Instead, organizations can migrate these workloads to serverless functions platforms to gain cost efficiency and operational simplicity. By using serverless functions, organizations don’t need to manage infrastructure or scaling configurations, and they keep costs aligned with actual usage by incurring charges only when a function is invoked.

事実 9

Most organizations that use serverless functions also use containers

We found that 66% of organizations that use serverless functions also use at least one container orchestration service in the same cloud. Serverless functions and containers overlap in capability, and the use of both suggests that organizations selectively choose the technology that best fits each of their workloads. For example, organizations may use serverless functions for spiky workloads and containers for long-running workloads with complex infrastructure.

The combination of serverless functions and containers may also reflect organizations’ shift to modern development practices, which empower teams to choose their preferred tools rather than holding them to a standardized stack.

A donut chart showing that 66% of organizations using serverless functions also use containers, while 34% rely solely on serverless functions.
事実 10

Arm usage continues to expand across Lambda functions and cloud instances

Our research shows that the share of AWS Lambda functions running on Arm—rather than x86—grew from 9% to 19% over the past 2 years. Arm-based cloud instances also showed a substantial increase, rising from 9% to 15%.

A line chart showing the share of workloads on Arm-based architecture rising over two years: Lambda functions increase from 9% to 19%, and cloud instances from 9% to 15%, indicating accelerating adoption of Arm for cost efficiency.

AWS advertises up to 34% better price performance for Lambda functions on Arm, and similar efficiency gains when Arm is used in services such as Amazon EC2, Amazon RDS, and Fargate. These advantages have already helped establish the trend toward Arm adoption for Lambda functions and containerized workloads. Given these efficiency improvements, we expect this momentum to continue as organizations deploy new workloads onto the more cost-efficient architecture.

方法論

Population

For this report, we compiled usage data from thousands of companies in Datadog's customer base. But while Datadog customers cover the spectrum of company size and industry, they do share some common traits. First, they tend to be serious about software infrastructure and application performance. And they skew toward adoption of cloud platforms and services more than the general population. All the results in this article are biased by the fact that the data comes from our customer base, a large but imperfect sample of the entire global market.

Fact 1

We measured usage of these GPU-based instances:

  • AWS: DL1, G2, G3, G4, G5, G6, GR6, P2, P3, P4, P5, P6
  • Azure: standard_nc, standard_nd, standard_ng, standard_nv
  • Google Cloud: a2, a3, a4, a4x, g2

To determine growth in instance hours, we measured GPU and CPU instance hours in Q4 2023 and calculated quarterly percentage changes compared to those baselines.

Fact 2

For this fact, we grouped workloads into the following categories, based on open source container image names:

  • AI: DCGM, Triton, Ollama, vLLM, PyTorch, Ray, MLflow, TensorFlow, TorchServe, TensorFlow Serving, Qdrant, Milvus, Weaviate, ChromaDB, Vespa, Kubeflow, GKE GPU maintenance-handler, KubeRay, Langfuse, LiteLLM, Open WebUI, pgvector
  • Analytics: Elasticsearch, Solr, Hadoop, Metabase, Hive, OpenSearch, Superset, dbt, Jupyter, Redash, JupyterHub, Trino, Druid, Presto, ClickHouse, Spark
  • CI/CD: Argo CD, Flux, Argo Rollouts, Jenkins, GitHub Actions, CircleCI, Tekton, Bamboo, TeamCity, GitHub Actions Runner Controller, GitLab Runner, Buildkite, Spinnaker, Drone, Codefresh, Azure Pipelines, Concourse
  • Databases: Redis, PostgreSQL, MongoDB, MySQL, Memcached, etcd, MariaDB, InfluxDB, Valkey, Oracle, MSSQL, Cassandra, HBase, CouchDB, CockroachDB, Couchbase, Microsoft SQL Server, Informix, Neo4j, TimescaleDB, Dragonfly, KeyDB, ScyllaDB, Dgraph, TiDB, ArangoDB, QuestDB
  • Internal developer platforms: Crossplane, Portainer, Upbound, Mia-Platform, Garden, Ketch, Qovery, OpsLevel, Shipa, Devtron, Backstage
  • Messaging and streaming: Kafka Streams, ksqlDB, Kafka, RabbitMQ, ActiveMQ, HiveMQ, NATS, Redpanda, Mosquitto, Pulsar, EMQX, NSQ, ActiveMQ Artemis, Solace, VerneMQ, IBM MQ, Flink, Storm, Beam, Materialize, Heron
  • Web and app servers: NGINX, OpenResty, JBoss, Django, Express, Next, WildFly
  • Worfklows: Airflow, Temporal, n8n, Prefect, Flyte, Airbyte, Dagster, Kubeflow Pipelines (KFP), Argo Workflows, Luigi, Meltano

Fact 3

We measured CPU and memory utilization by using platform-specific metrics for each environment:

  • For ECS Fargate, we calculated utilization as the ratio of ecs.fargate.mem.usage to ecs.fargate.mem.limit for memory and used ecs.fargate.cpu.percent for CPU.
  • In Azure Container Apps, we used the native usage percentage metrics azure.app_containerapps.cpu_percentage and azure.app_containerapps.memory_percentage.
  • For Google Cloud Run, we relied on fractional utilization metrics gcp.run.container.cpu.utilizations.avg and gcp.run.container.memory.utilizations.avg.
  • For AWS Lambda, we calculated memory utilization as aws.lambda.enhanced.max_memory_used divided by aws.lambda.enhanced.memorysize and used aws.lambda.enhanced.cpu_total_utilization_pct for CPU.

We included only workloads with nonzero CPU and memory utilization values.

Fact 4

We measured workloads' average and maximum CPU usage at 5-minute intervals and aggregated them hourly. We considered a workload to have multiple significant CPU spikes per day if those values differed by 100% or more in 2 or more hours within a 24-hour period.

Fact 5

We looked at the configuration of HPA-enabled deployments and categorized them based on scaling metrics.

Fact 6

We considered a node to be managed by Karpenter or Cluster Autoscaler if it ran in a Kubernetes cluster with at least one active instance of that autoscaler.

Fact 7

We considered an organization to be a customer of a given cloud if they ran at least five hosts per month in that cloud. We considered them to be a customer of a particular serverless compute service if they ran at least five functions or one application per month on that service.

Fact 8

For this fact, we collected uptime metrics over 2 days and excluded containers that were still sending metrics in the final hour of the time frame (because we inferred that they were not finished running). We also excluded init containers, including sidecar containers implemented as init containers with restartPolicy value of "always".

Fact 10

We measured compute usage of Amazon EC2, Google Cloud Compute Engine, and Azure VM instances running on Arm-based architecture. We considered the following instance types to be Arm-based:

  • AWS: Graviton-based EC2 instances
  • Azure: Cobalt or Ampere Altra VMs with feature letter “p” in instance type
  • Google Cloud: C4A (Axion) and Tau T2A Arm-based VMs

Licensing

Report: CC BY-ND 4.0

Images: CC BY-ND 4.0