CopyFail in Kubernetes: How CVE-2026-31431 Enables Container Escape Without Privileges

A kernel page-cache race condition turns shared container image layers into a lateral-movement highway. Any unprivileged pod can corrupt a binary in a shared overlayfs layer, wait for a privileged DaemonSet to execute it, and achieve full node-level code execution — no root, no CAP_SYS_ADMIN, no traditional escape technique required.

Why CopyFail Is Different Inside a Cluster

CVE-2026-31431 — nicknamed CopyFail — was first disclosed on April 29–30, 2026. On a standalone Linux host the bug is already serious: a race condition in the AF_ALG splice path lets an unprivileged process corrupt page-cache pages that back file contents, silently rewriting on-disk data without needing write permissions. CVSS rates it 7.8.

Inside Kubernetes, however, the blast radius multiplies by orders of magnitude. The Linux page cache is a node-wide resource — it does not respect container, namespace, or pod boundaries. Container runtimes such as containerd and CRI-O use overlayfs to share identical image layers across every container on the same node. When two pods pull the same base image, their lower layers point to the same inodes, which means the same page-cache pages back both containers’ views of those files.

This architectural shortcut — designed for storage efficiency and fast pod startup — is the hinge that turns a local privilege-escalation primitive into a cross-container lateral-movement technique. An attacker who can schedule one unprivileged pod is enough.

7.8
CVSS score — high severity, no privileges required
4.14–6.12
Linux kernel versions affected (through early 2026)
3
Major managed K8s providers confirmed vulnerable (EKS, GKE, ACK)

The Three Conditions Behind the Attack

CopyFail in Kubernetes relies on three conditions converging on the same node. Remove any one and the chain breaks — but in most production clusters, all three are present by default.

Condition 1 — Kernel page-cache corruption via AF_ALG splice race

The AF_ALG subsystem exposes kernel crypto algorithms to user space. When combined with splice(), a race between page-reference accounting and the crypto completion path allows a user-space process to modify pages that should be read-only. The corruption persists in the page cache and is visible to every process mapping or reading the affected inode — including processes in other containers.

Condition 2 — Image layer sharing via overlayfs

Container runtimes deduplicate identical image layers on a per-node basis. If two pods share a base image (for example, debian:bookworm or gcr.io/google-containers/pause), the corresponding lower directories in the overlay mount resolve to the same physical files and therefore the same page-cache entries. This means one container’s corruption of a file in a shared layer is immediately visible to every other container using that layer.

Condition 3 — Privileged DaemonSets execute from shared layers

Many clusters run elevated-privilege workloads as DaemonSets: kube-proxy, CNI plugins (calico-node, cilium-agent), storage CSI drivers, logging agents, and node monitors. These pods often execute binaries from base-image layers — such as /usr/sbin/ipset, /usr/sbin/iptables, or /usr/bin/nsenter — and they run with hostPID, hostNetwork, or privileged: true. When any of them load a binary whose page-cache pages have been corrupted, they execute the attacker’s code at the DaemonSet’s privilege level — typically full node root.

Attack Path: From Unprivileged Pod to Node Root

The following diagram shows the concrete sequence an attacker follows inside a Kubernetes cluster. Each step is achievable from a standard, non-root pod with no special capabilities.

CopyFail Attack Path in Kubernetes Unprivileged pod → shared image layer → page cache → privileged DaemonSet → node root Kubernetes Node STEP 1 Attacker Pod Unprivileged, no caps Triggers AF_ALG splice race STEP 2 Shared Image Layer overlayfs lower dir (read-only) e.g. /usr/sbin/ipset corrupts binary STEP 3 Node Page Cache Node-wide, shared across all containers on the node poisoned pages persist STEP 4 Privileged DaemonSet kube-proxy, calico-node, etc. Executes corrupted binary reads poisoned page STEP 5 Node Root Access Full host privileges kubelet creds, secrets, etcd attacker code runs Blast Radius After Node Compromise Kubelet credentials Create/delete pods on node All pod secrets Service account tokens, env vars Container runtime socket Spawn arbitrary containers Network pivot Access node subnet & pod CIDR

CopyFail attack chain in Kubernetes: an unprivileged pod corrupts a shared image-layer binary via the page cache, a privileged DaemonSet executes it, and the attacker inherits full node-level access.

Why Kubernetes Multiplies the Severity

On a standalone server, CopyFail requires the attacker and the privileged victim process to share the same kernel page cache — which they do by definition, since there is only one. But the attacker also needs a privileged process to execute the corrupted binary, which might be an uncommon event on a hardened single-purpose host.

Kubernetes flips this equation. Clusters are designed to colocate heterogeneous workloads — including privileged infrastructure agents — on shared nodes. Consider what exists on a typical production node:

  • kube-proxy runs as a DaemonSet on every node and periodically invokes iptables or ipset from its base image layer.
  • CNI plugins (Calico, Cilium, Flannel) run privileged and execute network-configuration binaries from shared layers at pod-creation time.
  • CSI node plugins for block-storage drivers run with privileged: true and call filesystem tools from base images.
  • Logging & monitoring agents (Datadog, Fluentd, Prometheus node-exporter) often mount hostPath volumes and run elevated.
  • Node-problem-detector and other diagnostic DaemonSets run with host PID and network access.

Each of these is a viable execution target. The attacker does not need to know which DaemonSet will trigger first — they can corrupt multiple binaries across shared layers and wait for any one to execute.

Affected Managed Kubernetes Services

The proof-of-concept for CopyFail has been validated on three major managed Kubernetes platforms. Because the vulnerability lives in the host kernel, not in the Kubernetes control plane, any managed service running affected kernel versions is exposed.

Provider Service PoC validated Kernel fix status
Amazon Web Services EKS (Amazon Linux 2 / Bottlerocket) Yes Patches rolling out; verify AMI version
Google Cloud GKE (Container-Optimized OS / Ubuntu) Yes Node auto-upgrade applies fix; check channel
Alibaba Cloud ACK Yes Kernel updates available; manual node pool upgrade
Microsoft Azure AKS (Ubuntu / Azure Linux) Not yet confirmed Affected kernel versions in use; patch timeline TBD
Self-managed kubeadm, k3s, RKE2, etc. Depends on host OS Operator must update host kernel manually

Patched Kernel Versions

The AF_ALG splice race was introduced in Linux 4.14 and persists through kernel versions released in early 2026. The following stable branches contain the fix:

Kernel branch Fixed in Notes
5.10.x (LTS) 5.10.254+ Common in older EKS AMIs and Debian-based nodes
5.15.x (LTS) 5.15.204+ Ubuntu 22.04 HWE kernel
6.1.x (LTS) 6.1.170+ Debian 12, newer EKS AMIs
6.6.x (LTS) 6.6.137+ Recent Container-Optimized OS builds
6.12.x (stable) 6.12.85+ Cutting-edge distributions

Run uname -r on each node (or query your node-pool configuration) and compare against these thresholds. Any version below the listed fix is vulnerable.

Kubernetes-Specific Mitigation Playbook

Patching the kernel is the definitive fix. But clusters rarely patch every node simultaneously, and many organizations need defense-in-depth measures for the gap window. The following mitigations are specific to the Kubernetes attack surface.

1. Patch node kernels immediately

For managed services, trigger a node-pool upgrade or enable automatic node upgrades and verify the target AMI/image contains a patched kernel. For self-managed clusters, update the host OS kernel and reboot nodes in a rolling fashion. Cordon and drain before rebooting to minimize workload disruption.

  • Enforce Pod Security Standards (PSS) — Apply the restricted profile cluster-wide. This blocks privileged: true, hostPID, and hostNetwork on tenant workloads, limiting which pods can become useful execution targets.
  • Isolate privileged workloads on dedicated node pools — Taint nodes that run DaemonSets like kube-proxy or CNI agents and prevent tenant pods from being scheduled there. This breaks the image-layer sharing condition.
  • Reduce image layer overlap — Use distroless or scratch-based images for privileged DaemonSets. If kube-proxy runs from a minimal image with no shared base layer, there is no overlapping inode for the attacker to target.
  • Disable AF_ALG where not needed — If your workloads do not require kernel crypto offload, blacklist the af_alg module (modprobe -r af_alg && echo "blacklist af_alg" > /etc/modprobe.d/disable-af_alg.conf). This removes the splice race entirely.
  • Enable seccomp profiles — A default seccomp profile that blocks splice or restricts AF_ALG socket creation can prevent the exploit from triggering. Use RuntimeDefault as a baseline.
  • Audit DaemonSet privileges — Review every DaemonSet in your cluster. Do they genuinely need privileged: true? Can specific capabilities replace blanket elevation? Shrinking the privilege surface reduces the pool of viable execution targets.
  • Control workload deployment with admission policies — Use OPA Gatekeeper, Kyverno, or built-in ValidatingAdmissionPolicies to restrict which images may be deployed and which security contexts are allowed. Preventing untrusted workloads from landing on production nodes is the first line of defense.

Detection Guidance

CopyFail exploitation is subtle because it does not require any container escape syscalls (unshare, setns, mount). Detection must focus on the precursor and post-exploitation indicators:

Detection point What to look for Tooling
AF_ALG socket creation socket(AF_ALG, ...) calls from non-crypto workloads Falco, Tetragon, auditd
splice() on AF_ALG file descriptors Unusual splice syscall patterns paired with AF_ALG seccomp audit mode, eBPF tracing
Binary integrity drift Hash mismatches on known binaries in lower overlay layers File integrity monitoring (AIDE, osquery)
Unexpected process execution in DaemonSets DaemonSet pods spawning shells, network tools, or unknown child processes Runtime security (Falco rules, Sysdig)
Node-level lateral movement Processes accessing kubelet credentials, container runtime sockets, or other pod namespaces Tetragon process-tree policies

A Falco rule targeting AF_ALG socket creation in non-privileged pods is the highest-signal, lowest-noise detector for the initial exploitation phase. Combine it with overlay-layer file integrity checks for defense in depth.

The Bigger Lesson: Workload Identity and Access Control Matter

CopyFail is a kernel bug, but its Kubernetes blast radius is an architectural problem. The fact that any pod can corrupt a binary that a privileged DaemonSet will execute reveals a gap that no amount of network policy alone can close. The real question is: who was allowed to deploy that pod in the first place?

Clusters that treat workload deployment as a low-trust operation — where any CI pipeline or developer can push a pod spec to any namespace — have the widest exposure. Clusters that enforce deployment-time identity verification, namespace-scoped admission control, and just-in-time access to sensitive operations (kubectl exec, node SSH, privilege escalation) have a meaningfully smaller attack surface.

This is not hypothetical defense. CopyFail proves that the blast radius of a kernel vulnerability scales with the density of privilege on a node. Controlling who can deploy workloads, who can access nodes, and who can exec into pods is the access-layer defense that complements kernel patching.

Control who deploys, accesses, and operates your Kubernetes nodes

OnePAM provides just-in-time access to infrastructure — including Kubernetes nodes, kubectl exec, and SSH sessions — with identity verification, approval workflows, and session recording. Reduce standing privileges across your clusters and produce audit trails that prove least-privilege compliance.

Start Free Trial

Timeline

Date Event
Apr 29–30, 2026 CVE-2026-31431 first disclosed; kernel patches published for stable branches
Early May 2026 PoC validated on EKS, GKE, and ACK demonstrating cross-container exploitation
Ongoing Managed K8s providers rolling out patched node images; operators urged to upgrade

Bottom Line

CVE-2026-31431 is not just another privilege escalation — in Kubernetes, it is a cross-container lateral-movement primitive that requires zero privileges to exploit. The combination of node-wide page cache, overlayfs layer sharing, and ubiquitous privileged DaemonSets creates an attack surface that is present by default on virtually every production cluster.

Patch your node kernels. Isolate privileged workloads. Reduce image-layer overlap. Enforce Pod Security Standards. And critically, control who can deploy workloads and access nodes in the first place — because the next kernel bug that weaponizes shared infrastructure is not a matter of if, but when.

CopyFail proves that in Kubernetes, the page cache is a trust boundary that nobody drew on the architecture diagram. It is time to draw it, defend it, and audit who crosses it.

OnePAM Team
Security & Infrastructure Team