
Overview:
CVE-2025-23266
is a container‑escape vulnerability (CVSS 9.0) affecting the NVIDIA Container Toolkit and GPU Operator. While this vulnerability requires multiple specific conditions, it has the potential to allow a malicious container image to escape its sandbox and execute code as root on the host.
NVIDIA has released patched versions of both components. Upgrading to Toolkit v1.17.8
and GPU Operator 25.3.1
or disabling an optional hook mitigates the vulnerability. In addition, Upwind’s continuous runtime protection detects vulnerable hosts and blocks exploitation attempts at runtime, securing environments even if all of the required conditions for this vulnerability to be exploited are met.
Vulnerability Summary:
The NVIDIA Container Toolkit relies on OCI (Open Container Initiative) hooks to prepare a container for GPU access. One such hook, enable‑cuda‑compat
, inherits environment variables from the container. An attacker can craft an image that sets LD_PRELOAD
to point to a rogue library. When the privileged hook starts, the library loads outside the container, giving the attacker root on the node. In multi‑tenant GPU clusters, this risks cross‑tenant data exposure.
Affected Versions
- NVIDIA Container Toolkit: versions ≤
1.17.7
are affected. Versions ≤1.17.4
are vulnerable when running in CDI mode. - NVIDIA GPU Operator: versions ≤
25.3.0
are affected in all modes.
The issue is fully resolved in Toolkit v1.17.8
and GPU Operator v25.3.1
. Air‑gapped or serverless GPU services that don’t expose the Toolkit binary are not affected.
Odds of Exploitation:
While the vulnerability was given a CVSS 9.0 rating, there are a number of environmental factors that make it less likely to be exploited, such as:
- No remote code execution by itself: This issue does not allow remote code execution on its own. To exploit it, an attacker must be able to run a container on a Kubernetes node equipped with an NVIDIA GPU and the NVIDIA Container Toolkit.
- Single‑hook exposure: Only the enable-cuda-compat hook is affected, and it is optional. Disabling this hook mitigates the issue.
- Straightforward patch path: the fixed binaries are a drop‑in upgrade; no driver or kernel changes required.
- Host isolation still works: VM boundaries remain intact. In managed Kubernetes services that use VM‑level isolation (e.g., GKE Node Pools, EKS Managed Node Groups), a compromise stays within the VM.
Remediation
1. Upgrade the NVIDIA container toolkit
# Ubuntu/RHEL
apt/yum update nvidia-container-toolkit
helm upgrade gpu-operator nvidia/gpu-operator \
--set toolkit.version=v1.17.8-ubuntu20.04 # adjust tag for your distro
Copied
2. Disable the affected hook as a temporary measure:
- In runtime mode, set
disable-cuda-compat-lib-hook = true
in/etc/nvidia-container-toolkit/config.toml
. - In GPU Operator deployments, add
disable-cuda-compat-lib-hook
to theNVIDIA_CONTAINER_TOOLKIT_OPT_IN_FEATURES
environment variable.
3. Restrict execution of untrusted images until all affected nodes are patched.
How Upwind Protects Against CVE-2025-23266
Upwind ensures that AI infrastructure remains protected against container breakout vulnerabilities like CVE‑2025‑23266
. Key protections include:
- Automated asset discovery: Upwind’s runtime scanner identifies all hosts and containers running vulnerable Toolkit ≤
1.17.7
or GPU Operator ≤25.3.0
versions, enabling targeted remediation. - eBPF‑powered escape detection: Upwind hooks observe events and raise critical alerts on suspicious activity such as
LD_PRELOAD
injections intonvidia‑ctk
, alerting teams before privilege escalation completes. - Risk‑based prioritization: nodes that actually execute untrusted images or share GPUs between tenants are prioritized for response, ensuring security teams focus efforts where they matter most.
Final thoughts
While container escape vulnerabilities are serious, there are multiple specific conditions that must be met in order for this CVE to be exploited, making it significantly less likely to impact customers. The mitigation is also straightforward: upgrade one package or disable one configuration flag. Upwind customers are already protected by continuous runtime monitoring, real-time threat detection, and environment-aware risk scoring.
If you need help locating or addressing this vulnerability in your environment, contact us at [email protected] to see how Upwind transforms vulnerability management from days of analysis into minutes of action.