
Nicholas Thomson

Scott Gerring
"There are only two hard things in Computer Science: cache invalidation and naming things." —Phil Karlton
In the age of increasingly frequent deploys, edge caching, and Jamstack adoption, caching plays a key role across the software delivery life cycle. In build and CI pipelines, caching compiled assets or dependencies helps reduce compute costs, speed up job runtimes, and lower the environmental impact (regarding energy usage) of repeated builds. Separately, in deployment pipelines and runtime content delivery, caching is critical for delivering content quickly and reliably, as it reduces latency by serving assets from edge locations and content delivery networks (CDNs).
CI/CD pipelines require different approaches to caching and cache purging, each critical to delivering fast, reliable user experiences. In CI workflows, precise purging of build caches ensures that outdated dependencies or artifacts don’t interfere with new builds; this helps maintain reproducibility and reduces the risk of build failures while allowing you to benefit from the decrease in build time that build caching can provide. In CD and runtime environments, cache purging ensures that users receive the most up-to-date content by invalidating stale assets in edge caches and CDNs. Without timely purging in either context, teams risk bugs, broken experiences, or confusing behavior after a deployment. Automating cache management across both CI and CD pipelines improves reliability, maintains consistency, and enables smoother rollouts of new features and fixes.
In this post, we will cover:
- How things are cached in CI workflows
- How things are cached in application runtimes
- The risks of not purging caches
- Cache purging patterns that balance freshness, performance, and control
- Best practices for cache purging
Caching in CI
Caching in CI workflows is designed to preserve resources that are expensive to regenerate and that remain relatively stable between builds. This often includes package directories such as node_modules
for Node.js, .m2/
for Maven, or .venv/
for Python, where dependency resolution steps like apt install
, pip install
, or npm install
are cached to avoid redownloading the same packages every time. CI pipelines may also cache base images or intermediary layers for faster container builds, as well as build artifacts produced by compilers or build tools such as Gradle, Bazel, or webpack.
In some cases, portions of the build output, which are normally thought of as uncacheable, can also be reused if they don’t change frequently. This includes executable files (.exe
), dynamic libraries (.dll
, .so
, .dylib
), or package archives (.jar
, .war
, .apk
) that are not produced from the source code of the build itself or that do not vary with every build. These might be tools or support binaries that the build system rebuilds every time, but which themselves are stable and version-pinned across many builds. Similarly, compiled outputs like .class
files (Java) or .o
object files (C/C++) may be good candidates for caching when they represent supporting libraries or modules that don't change as frequently as the main application code, especially in environments where these libraries are included in source form rather than precompiled. In these scenarios, caching helps avoid redundant compilation steps and speeds up feedback cycles, even when you're working with build outputs.
CI cache platforms
Many modern CI platforms offer built-in support for artifact caching, making it easier to speed up pipelines and reduce redundant work. These capabilities are especially useful when you're dealing with large dependency trees, Docker builds, or compiled assets that don’t change frequently between runs.
One of the most widely used caching strategies involves Docker image layers. Since many CI pipelines wrap a Docker build step, Docker's built-in caching system naturally fits into the workflow. Each instruction in a Dockerfile creates a new layer, and Docker will reuse unchanged layers across builds provided the layer cache is accessible. This makes it easy to cache dependencies or setup steps that don’t change often, such as installing packages or setting up the build environment.
GitHub Actions offers caching through the actions/cache
action, which allows teams to store and retrieve files or directories between workflow runs. This is especially useful for caching directories like .npm
, .m2
, or .venv
, but it also supports Docker layer caching when integrated with Docker's GitHub integration. Tools like Depot offer optimized caching mechanisms to make Docker builds even faster within GitHub-hosted runners.
Similarly, GitLab CI/CD allows developers to define specific paths to cache between jobs and pipelines by using the cache:
keyword. Teams can control how caches are reused with cache:key
and fine-tune when caches are saved or restored by using cache:policy
. This flexibility makes it easy to reuse dependencies, compiled outputs, or downloaded artifacts across different jobs in a pipeline.
CircleCI also supports caching through save_cache
and restore_cache
steps, giving teams granular control over what gets saved and when. Caches are referenced using key patterns that can include dynamic elements like checksums or environment variables. To avoid bloating storage, CircleCI automatically evicts the least recently used (LRU) caches, ensuring that high-priority caches remain available without requiring manual cleanup.
To see how these caching strategies pay off in real-world scenarios, consider a high-velocity microservices environment, like an ecommerce platform with dozens of independently deployed services. Each service likely has its own CI pipeline to install dependencies, run tests, and build Docker images. To speed things up, the team uses BuildKit with Docker layer caching. By structuring Dockerfiles to group slow-changing layers near the top and using --cache-from
and --cache-to
with a remote registry, they’re able to reuse base images and dependency layers across builds. This significantly cuts container build times—it’s not uncommon to see builds go from taking five to seven minutes, for example, to less than 90 seconds—which in turn shortens feedback loops and conserves CI compute resources.
Application runtime caching
While build-time caching plays a major role in speeding up CI/CD pipelines, caching responsibilities don’t end when the application is deployed. Once an application is live, caching becomes a runtime concern, one that often influences the delivery process itself. This shift in responsibility impacts what gets fetched, when it’s fetched, and from where. Runtime caching is less about speeding up builds and more about reducing latency, offloading origin infrastructure, and ensuring fast, consistent user experiences after deployment.
One of the most common examples of runtime caching involves static assets (e.g., HTML, CSS, JavaScript, images, and documentation pages) that are served after deployment. These files are often delivered via CDNs such as Cloudflare, Fastly, Akamai, or Amazon CloudFront. CDNs cache static content at edge locations globally, allowing applications to load quickly for users regardless of their geographic location. Additionally, HTTP caches, like browser or intermediate proxy caches, can play a role in minimizing repeated downloads.
Runtime caching also applies to application configuration and state, including environment variables, feature flags, and metadata. These can be stored in-memory or cached through external systems such as CDNs or configuration APIs. Two primary patterns govern how configuration is delivered:
Static injection at launch: Configuration values—such as environment variables—are injected into the application at startup. These values are effectively "cached" for the entire life cycle of the running process. This pattern is common in environments managed by infrastructure-as-code tools, where config changes often trigger a redeploy and restart of affected services. However, if you're not using automation, changes to config may not take effect until the application is manually restarted.
Dynamic retrieval at runtime: The application periodically fetches configuration from an external source, such as a remote config service or API. While this adds flexibility, it also introduces the need for careful caching strategies—either in memory or through shared stores—to avoid unnecessary latency or overloading the config backend.
Another critical form of runtime caching occurs at the application logic and data layer. Responses from APIs can be cached using reverse proxies like NGINX, which sit in front of REST endpoints to serve repeated requests more efficiently. Similarly, expensive computed outputs, such as the results of complex database queries or machine learning inference (e.g., vector embeddings, recommendations), are often cached in memory stores like Redis or Memcached. These caches help absorb high traffic and reduce load on compute-intensive services.
Several modern hosting platforms also offer native support for runtime caching, making it easier to optimize delivery without deeply customizing infrastructure. These include Vercel, Netlify, Cloudflare Pages, and CDNs like Amazon CloudFront, Akamai, and Fastly. These platforms often support advanced cache-control mechanisms, like edge-based invalidation and stale-while-revalidate strategies. Additionally, these platforms offer support for the typical HTTP cache-control headers that an HTTP server uses to tell its clients how long particular content is valid for, to strike a balance between freshness and performance.
Runtime caching is an essential part of an application’s delivery strategy. To ensure efficient and resilient user experiences, it's critical to plan for how configurations, static assets, API responses, and computation-heavy results are cached and invalidated after deployment.
The risks of not purging caches
While caching can dramatically speed up builds and deployments, failing to purge stale or unnecessary cache entries carries real risks, both for development velocity and application dependability.
Risks of not purging in CI pipelines
In CI environments, outdated or corrupted artifacts can quietly break builds in ways that are hard to diagnose. For example, a lingering .class
file or webpack bundle from a previous build might reflect outdated logic or dependency versions, leading to subtle bugs or unexpected behavior in deployed code. These kinds of errors are often difficult to reproduce locally and can waste valuable engineering time in debugging sessions.
Cache staleness can also lead to hidden dependency drift. Cached versions of node_modules
, Python virtual environments, or Java .jar
files may mask changes in package.json
, requirements, .txt
, or pom.xml
, respectively. This mismatch can result in inconsistencies between local and CI environments, causing flaky builds or behavior that’s hard to replicate across developer machines and automated pipelines.
Another common issue is performance degradation over time. As caches accumulate unused layers, such as Docker images from long-deleted branches or dependency directories from old builds, they can bloat disk usage, slow down CI jobs, or even exceed storage quotas on hosted CI platforms. In such cases, what was once a performance optimization becomes a bottleneck.
Stale caches can also introduce security risks. If vulnerable dependencies are left in the cache and not replaced by patched versions, teams may unknowingly continue deploying insecure artifacts. Without regular purging or validation, these risks may persist silently through subsequent builds.
Risks of not purging in application runtimes
On the CD side, runtime cache staleness can result in users receiving outdated content, even after a new deployment has succeeded. This is especially common with cached static assets or application state delivered via CDNs or edge networks. If purge steps are not correctly incorporated into the deployment process, bug fixes or new features may fail to appear for users, leading to confusion, inconsistent behavior across environments, and difficult-to-investigate support tickets.
Cache purging patterns for CI build artifacts
To avoid the many pitfalls of stale or bloated caches, teams should treat cache purging as a deliberate and proactive strategy in CI workflows. Fortunately, modern CI platforms support several common patterns that make cache invalidation predictable and maintainable.
Hashed key
One of the most effective approaches is to assign a content-based cache key to dependencies or build artifacts. When the CI system detects that the key has changed, typically due to a change in the source or dependency manifest, it invalidates the old cache and rebuilds the artifact.
A popular example of this is caching node_modules
based on a hash of yarn.lock
or package-lock.json
. If any dependencies change, the hash changes too, triggering a fresh cache. GitHub Actions supports this via the actions/cache
mechanism, which allows you to use file-based hashes as part of your cache key:
- name: Cache Node.js modules uses: actions/cache@v3 with: path: | **/node_modules key: node-modules-${{ hashFiles('**/yarn.lock') }} restore-keys: | node-modules-
This pattern is not limited to JavaScript. Languages like Rust (Cargo.lock
) and Java/Kotlin (build.gradle.kts
) can benefit from the same logic. The general idea is that when the source of truth for your dependencies changes, the cache should be treated as invalid.
Time to live (TTL) and least recently used (LRU)
Another common strategy is time-based cache expiration. In this model, cached items such as compiled binaries or dependency folders are purged automatically after a defined time period, regardless of whether their content has changed. This helps guard against cache bloat from long-abandoned branches or outdated builds.
GitHub Actions allows you to set custom TTLs by using the retention-days
option. By default, GitHub will purge a cache that hasn’t been accessed in seven days, making this a lightweight, automated way to clear unused data over time.
Closely related is the LRU cache eviction policy, which removes the cache entries that haven’t been accessed in the longest time. Platforms like CircleCI use LRU prioritization to ensure that the most relevant caches are preserved while older, unused ones are cleared to conserve storage.
By combining content-based invalidation with time- or usage-based eviction, engineering teams can keep their CI caches lean, relevant, and high-performing, without relying on manual intervention. These patterns ensure that caches work for you rather than against you, reducing build times while maintaining correctness and security.
Cache purging patterns for runtime artifacts and CD workflows
Just like build-time caching in CI, cache purging in delivery pipelines plays a critical role in ensuring your users always get fresh, accurate content without compromising performance or putting strain on origin infrastructure.
Versioned assets + soft purging (the gold standard)
The most reliable approach combines versioned assets with soft purging of entry points. In this model, you generate content-addressed filenames for static assets during your build (e.g., main.ab12c.js
) by using tools such as webpack, Vite, or Parcel. Since the filename changes with every build, browsers and CDNs treat it as a new resource and cache it aggressively.
At the same time, the root entry points, typically your HTML files like /index.html
, should be served with no-cache
headers. These entry points are always fetched fresh so they reference the latest versioned assets correctly. Without this step, users could receive a cached HTML page that points to outdated JavaScript or CSS, resulting in inconsistent behavior.
For example, imagine a DevOps engineer at a retail company deploys a new product page template with updated pricing logic. While the new JavaScript is deployed, the team forgets to purge the CDN’s cache of the HTML entry point. As a result, some users continue to see outdated pricing data. The fix is to adopt a versioned asset and soft purge strategy. After deployment, the team aggressively caches versioned assets and soft-purges only the HTML entry points, ensuring consistent content delivery with minimal origin load.
Tag-based purging
If your content isn’t versioned—or if you need to purge groups of related assets at scale—tag-based purging is a powerful pattern. Platforms like Cloudflare, Akamai, Google Cloud, and Bunny.net support assigning custom tags (e.g., product:123, lang:en
) to assets. After a deployment, your CI/CD pipeline can purge by tag, only invalidating what was changed.
Path-based purging
In cases where tagging or versioning isn’t feasible, path-based purging offers a more straightforward alternative. CDNs like Amazon CloudFront, Cloudflare, and Fastly support purging by exact path or wildcard (e.g., /docs/*
, /blog/post-123
). After deployment, your pipeline identifies what was changed and purges only those specific paths.
Deploy-triggered purge hooks
You can also automate purging by using deploy-time hooks. Many CDNs and static hosts like Cloudflare Pages offer APIs to purge caches after a successful deployment. CI/CD tools like GitHub Actions, CircleCI, and GitLab support post-deploy jobs or steps, letting you purge only when the deployment has succeeded and reducing the risk of purging stale or rolled-back content.
Dependency-aware purging
For more complex applications, especially ones with interlinked content, dependency-aware purging can help prevent stale content from lingering. In this model, your build system generates a dependency graph that tracks how content relates (e.g., which product pages reference which category pages). When content or data changes, your pipeline can use this graph to identify and purge related paths or tags, ensuring that dependent assets are refreshed together.
Best practices for cache invalidation
In addition to the right tooling, purging caches effectively requires clear operational discipline and observability. Here are some tips to ensure your cache purge strategy adheres to these ideals.
Use content-based hashing
Use content-based hashing to automatically invalidate stale assets without requiring explicit purges. By including a hash of a file’s contents in its filename (e.g., main.ab12c.js
), any change results in a new URL that CDNs and browsers treat as a fresh resource. This ensures users always receive the latest version while avoiding the overhead and risk of manual purging.
Simulate before you purge
Use dry-run modes or simulation features where available. For example, tools like Ansible support check_mode
to preview changes. This helps you validate purge scopes before you apply the changes, reducing the risk of accidental outages or performance hits.
Balance freshness with consistency
Understand the consistency requirements of your systems. Many CDNs, CI caches, or product listings can tolerate slightly stale data (eventual consistency, i.e., data that may not update immediately everywhere, but will converge to the correct state over time.). Others, like financial systems, alerting platforms, or real-time collaboration tools, require strong consistency. This distinction should guide whether you use aggressive purging, soft invalidation, or simply versioned assets.
Rate-limit and batch large purges
To avoid overloading your CDN or triggering rate limits, batch large purge jobs and apply rate limits. This protects against cache stampedes and ensures purges complete reliably, even during high-traffic deployments.
Monitor and alert on purge failures
Always monitor purge execution and latency. Set up alerts for failures or delays so you can catch stale content issues early. If a purge silently fails, users may continue to see outdated content without any signal in your dashboards.
Match staging and production environments
Ensure that your staging and production environments have consistent caching behavior. Many bugs arise because staging uses a different CDN, cache config, or TTL policy, leading to false confidence during testing.
Avoid non-reproducible Docker commands
When building Docker images, avoid commands that change behavior over time, like apt install -y curl
. These make your images non-reproducible and difficult to cache predictably across environments or time.
Log every purge
Maintain an audit trail of purge attempts and results, including errors. This helps with debugging post-deploy issues, tracking down stale content, and improving confidence in your CI/CD pipeline.
Roll back on failed purges
Implement logic to automatically roll back or redeploy if a purge fails. This protects the user experience and keeps your site in a consistent, functional state.
Deploy strategically to preserve cache efficiency
Frequent deployments can reduce your cache hit ratio, especially if you purge aggressively each time. Instead:
- Avoid full purges unless absolutely necessary.
- Use versioned assets.
- Only purge entry points like HTML templates, and let hashed filenames manage everything else.
Use Reproducible Builds and optimized Dockerfiles
To maximize reuse across jobs and environments, use Reproducible Builds. For Docker, structure your Dockerfile by using this workflow:
- Copy only your
package.json
/requirements.txt
. - Install dependencies.
- Add the rest of your source code.
- Run the build.
This keeps slow, stable steps (like dependency install) cacheable even when application code changes.
Emit metrics and monitor purge health
Treat purging like a production system that needs to be comprehensively monitored. Emit custom metrics such as:
metric: cache.purge.success
tags: environment:prod
, region:us-east-1
, provider:cloudflare
, method:by_path
Track purge success rate, latency, and volume with dashboards and alerts. You can even define service level objectives (SLOs) for purge reliability if caching is critical to your delivery model. For deeper visibility, use Datadog integrations with CI/CD and CDN tools, such as GitHub Actions, GitLab, and Jenkins.
Develop an aggressive and effective cache purging strategy
In this post, we've explained why caching is essential in modern CI/CD workflows, the importance of cache purging, different patterns for effective cache purging, and best practices. These cache purging best practices complement Datadog CI Visibility, and Synthetic Testing by ensuring that freshly deployed content is reliably served and verified. When combined, CI Visibility can confirm whether a purge occurred after a successful deployment, while synthetic tests can immediately detect if stale or broken content is being served—helping teams catch issues caused by missed or delayed purges before users do. If you’re new to Datadog, sign up for a free trial to get started.