Why CopyFail Is Different Inside a Cluster
CVE-2026-31431 — nicknamed CopyFail — was first disclosed on April 29–30, 2026. On a standalone Linux host the bug is already serious: a race condition in the AF_ALG splice path lets an unprivileged process corrupt page-cache pages that back file contents, silently rewriting on-disk data without needing write permissions. CVSS rates it 7.8.
Inside Kubernetes, however, the blast radius multiplies by orders of magnitude. The Linux page cache is a node-wide resource — it does not respect container, namespace, or pod boundaries. Container runtimes such as containerd and CRI-O use overlayfs to share identical image layers across every container on the same node. When two pods pull the same base image, their lower layers point to the same inodes, which means the same page-cache pages back both containers’ views of those files.
This architectural shortcut — designed for storage efficiency and fast pod startup — is the hinge that turns a local privilege-escalation primitive into a cross-container lateral-movement technique. An attacker who can schedule one unprivileged pod is enough.
The Three Conditions Behind the Attack
CopyFail in Kubernetes relies on three conditions converging on the same node. Remove any one and the chain breaks — but in most production clusters, all three are present by default.
Condition 1 — Kernel page-cache corruption via AF_ALG splice race
The AF_ALG subsystem exposes kernel crypto algorithms to user space. When combined with splice(), a race between page-reference accounting and the crypto completion path allows a user-space process to modify pages that should be read-only. The corruption persists in the page cache and is visible to every process mapping or reading the affected inode — including processes in other containers.
Condition 2 — Image layer sharing via overlayfs
Container runtimes deduplicate identical image layers on a per-node basis. If two pods share a base image (for example, debian:bookworm or gcr.io/google-containers/pause), the corresponding lower directories in the overlay mount resolve to the same physical files and therefore the same page-cache entries. This means one container’s corruption of a file in a shared layer is immediately visible to every other container using that layer.
Condition 3 — Privileged DaemonSets execute from shared layers
Many clusters run elevated-privilege workloads as DaemonSets: kube-proxy, CNI plugins (calico-node, cilium-agent), storage CSI drivers, logging agents, and node monitors. These pods often execute binaries from base-image layers — such as /usr/sbin/ipset, /usr/sbin/iptables, or /usr/bin/nsenter — and they run with hostPID, hostNetwork, or privileged: true. When any of them load a binary whose page-cache pages have been corrupted, they execute the attacker’s code at the DaemonSet’s privilege level — typically full node root.
Attack Path: From Unprivileged Pod to Node Root
The following diagram shows the concrete sequence an attacker follows inside a Kubernetes cluster. Each step is achievable from a standard, non-root pod with no special capabilities.
CopyFail attack chain in Kubernetes: an unprivileged pod corrupts a shared image-layer binary via the page cache, a privileged DaemonSet executes it, and the attacker inherits full node-level access.
Why Kubernetes Multiplies the Severity
On a standalone server, CopyFail requires the attacker and the privileged victim process to share the same kernel page cache — which they do by definition, since there is only one. But the attacker also needs a privileged process to execute the corrupted binary, which might be an uncommon event on a hardened single-purpose host.
Kubernetes flips this equation. Clusters are designed to colocate heterogeneous workloads — including privileged infrastructure agents — on shared nodes. Consider what exists on a typical production node:
- kube-proxy runs as a DaemonSet on every node and periodically invokes
iptablesoripsetfrom its base image layer. - CNI plugins (Calico, Cilium, Flannel) run privileged and execute network-configuration binaries from shared layers at pod-creation time.
- CSI node plugins for block-storage drivers run with
privileged: trueand call filesystem tools from base images. - Logging & monitoring agents (Datadog, Fluentd, Prometheus node-exporter) often mount hostPath volumes and run elevated.
- Node-problem-detector and other diagnostic DaemonSets run with host PID and network access.
Each of these is a viable execution target. The attacker does not need to know which DaemonSet will trigger first — they can corrupt multiple binaries across shared layers and wait for any one to execute.
Affected Managed Kubernetes Services
The proof-of-concept for CopyFail has been validated on three major managed Kubernetes platforms. Because the vulnerability lives in the host kernel, not in the Kubernetes control plane, any managed service running affected kernel versions is exposed.
| Provider | Service | PoC validated | Kernel fix status |
|---|---|---|---|
| Amazon Web Services | EKS (Amazon Linux 2 / Bottlerocket) | Yes | Patches rolling out; verify AMI version |
| Google Cloud | GKE (Container-Optimized OS / Ubuntu) | Yes | Node auto-upgrade applies fix; check channel |
| Alibaba Cloud | ACK | Yes | Kernel updates available; manual node pool upgrade |
| Microsoft Azure | AKS (Ubuntu / Azure Linux) | Not yet confirmed | Affected kernel versions in use; patch timeline TBD |
| Self-managed | kubeadm, k3s, RKE2, etc. | Depends on host OS | Operator must update host kernel manually |
Patched Kernel Versions
The AF_ALG splice race was introduced in Linux 4.14 and persists through kernel versions released in early 2026. The following stable branches contain the fix:
| Kernel branch | Fixed in | Notes |
|---|---|---|
| 5.10.x (LTS) | 5.10.254+ | Common in older EKS AMIs and Debian-based nodes |
| 5.15.x (LTS) | 5.15.204+ | Ubuntu 22.04 HWE kernel |
| 6.1.x (LTS) | 6.1.170+ | Debian 12, newer EKS AMIs |
| 6.6.x (LTS) | 6.6.137+ | Recent Container-Optimized OS builds |
| 6.12.x (stable) | 6.12.85+ | Cutting-edge distributions |
Run uname -r on each node (or query your node-pool configuration) and compare against these thresholds. Any version below the listed fix is vulnerable.
Kubernetes-Specific Mitigation Playbook
Patching the kernel is the definitive fix. But clusters rarely patch every node simultaneously, and many organizations need defense-in-depth measures for the gap window. The following mitigations are specific to the Kubernetes attack surface.
1. Patch node kernels immediately
For managed services, trigger a node-pool upgrade or enable automatic node upgrades and verify the target AMI/image contains a patched kernel. For self-managed clusters, update the host OS kernel and reboot nodes in a rolling fashion. Cordon and drain before rebooting to minimize workload disruption.
- Enforce Pod Security Standards (PSS) — Apply the
restrictedprofile cluster-wide. This blocksprivileged: true,hostPID, andhostNetworkon tenant workloads, limiting which pods can become useful execution targets. - Isolate privileged workloads on dedicated node pools — Taint nodes that run DaemonSets like
kube-proxyor CNI agents and prevent tenant pods from being scheduled there. This breaks the image-layer sharing condition. - Reduce image layer overlap — Use distroless or scratch-based images for privileged DaemonSets. If
kube-proxyruns from a minimal image with no shared base layer, there is no overlapping inode for the attacker to target. - Disable AF_ALG where not needed — If your workloads do not require kernel crypto offload, blacklist the
af_algmodule (modprobe -r af_alg && echo "blacklist af_alg" > /etc/modprobe.d/disable-af_alg.conf). This removes the splice race entirely. - Enable seccomp profiles — A default seccomp profile that blocks
spliceor restrictsAF_ALGsocket creation can prevent the exploit from triggering. UseRuntimeDefaultas a baseline. - Audit DaemonSet privileges — Review every DaemonSet in your cluster. Do they genuinely need
privileged: true? Can specific capabilities replace blanket elevation? Shrinking the privilege surface reduces the pool of viable execution targets. - Control workload deployment with admission policies — Use OPA Gatekeeper, Kyverno, or built-in ValidatingAdmissionPolicies to restrict which images may be deployed and which security contexts are allowed. Preventing untrusted workloads from landing on production nodes is the first line of defense.
Detection Guidance
CopyFail exploitation is subtle because it does not require any container escape syscalls (unshare, setns, mount). Detection must focus on the precursor and post-exploitation indicators:
| Detection point | What to look for | Tooling |
|---|---|---|
| AF_ALG socket creation | socket(AF_ALG, ...) calls from non-crypto workloads |
Falco, Tetragon, auditd |
| splice() on AF_ALG file descriptors | Unusual splice syscall patterns paired with AF_ALG |
seccomp audit mode, eBPF tracing |
| Binary integrity drift | Hash mismatches on known binaries in lower overlay layers | File integrity monitoring (AIDE, osquery) |
| Unexpected process execution in DaemonSets | DaemonSet pods spawning shells, network tools, or unknown child processes | Runtime security (Falco rules, Sysdig) |
| Node-level lateral movement | Processes accessing kubelet credentials, container runtime sockets, or other pod namespaces | Tetragon process-tree policies |
A Falco rule targeting AF_ALG socket creation in non-privileged pods is the highest-signal, lowest-noise detector for the initial exploitation phase. Combine it with overlay-layer file integrity checks for defense in depth.
The Bigger Lesson: Workload Identity and Access Control Matter
CopyFail is a kernel bug, but its Kubernetes blast radius is an architectural problem. The fact that any pod can corrupt a binary that a privileged DaemonSet will execute reveals a gap that no amount of network policy alone can close. The real question is: who was allowed to deploy that pod in the first place?
Clusters that treat workload deployment as a low-trust operation — where any CI pipeline or developer can push a pod spec to any namespace — have the widest exposure. Clusters that enforce deployment-time identity verification, namespace-scoped admission control, and just-in-time access to sensitive operations (kubectl exec, node SSH, privilege escalation) have a meaningfully smaller attack surface.
This is not hypothetical defense. CopyFail proves that the blast radius of a kernel vulnerability scales with the density of privilege on a node. Controlling who can deploy workloads, who can access nodes, and who can exec into pods is the access-layer defense that complements kernel patching.
Control who deploys, accesses, and operates your Kubernetes nodes
OnePAM provides just-in-time access to infrastructure — including Kubernetes nodes, kubectl exec, and SSH sessions — with identity verification, approval workflows, and session recording. Reduce standing privileges across your clusters and produce audit trails that prove least-privilege compliance.
Timeline
| Date | Event |
|---|---|
| Apr 29–30, 2026 | CVE-2026-31431 first disclosed; kernel patches published for stable branches |
| Early May 2026 | PoC validated on EKS, GKE, and ACK demonstrating cross-container exploitation |
| Ongoing | Managed K8s providers rolling out patched node images; operators urged to upgrade |
Bottom Line
CVE-2026-31431 is not just another privilege escalation — in Kubernetes, it is a cross-container lateral-movement primitive that requires zero privileges to exploit. The combination of node-wide page cache, overlayfs layer sharing, and ubiquitous privileged DaemonSets creates an attack surface that is present by default on virtually every production cluster.
Patch your node kernels. Isolate privileged workloads. Reduce image-layer overlap. Enforce Pod Security Standards. And critically, control who can deploy workloads and access nodes in the first place — because the next kernel bug that weaponizes shared infrastructure is not a matter of if, but when.
CopyFail proves that in Kubernetes, the page cache is a trust boundary that nobody drew on the architecture diagram. It is time to draw it, defend it, and audit who crosses it.