CVE-2026-53923

MEDIUMCVSS 5.3/10EPSS 0.28%

Last modified

CVE-2026-53923 is a medium-severity vulnerability rated 5.3/10 on the CVSS scale. vLLM is an inference and serving engine for large language models (LLMs). From 0.5.5 until 0.23.1rc0, integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels (csrc/quantization/gguf/gguf_kernel.cu) causes partial tensor processing. EPSS estimates a 0.28% chance of exploitation in the next 30 days.

Description

vLLM is an inference and serving engine for large language models (LLMs). From 0.5.5 until 0.23.1rc0, integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels (csrc/quantization/gguf/gguf_kernel.cu) causes partial tensor processing. The output tensor is allocated at full size via torch::empty (uninitialized memory), but the dequantize CUDA kernel processes only a truncated number of elements. The unfilled portion of the output tensor retains whatever was previously in GPU memory. In multi-tenant inference deployments, this residual GPU memory may contain tensor data from other users' inference requests, constituting information disclosure. This vulnerability is fixed in 0.23.1rc0.

Metrics

CVSS 3.1
7.5/10

CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:N

CVSS 4.0
5.3/10

CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:P/VC:L/VI:L/VA:N/SC:N/SI:N/SA:N/E:X/CR:X/IR:X/AR:X/MAV:X/MAC:X/MAT:X/MPR:X/MUI:X/MVC:X/MVI:X/MVA:X/MSC:X/MSI:X/MSA:X/S:X/AU:X/R:X/V:X/RE:X/U:X

EPSS Probability
0.28%

19.8th percentile

Probability of exploitation in the next 30 days. Learn more

Weakness Enumeration

Affected Software

VendorProductVersions
VllmVllm>= 0.5.5, < 0.23.1

References

Timeline

Published
Last Modified
Status
Analyzed

Frequently Asked Questions

What is CVE-2026-53923?
vLLM is an inference and serving engine for large language models (LLMs). From 0.5.5 until 0.23.1rc0, integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels (csrc/quantization/gguf/gguf_kernel.cu) causes partial tensor processing. The output tensor is allocated at full size via torch::empty (uninitialized memory), but the dequantize CUDA kernel processes only a truncated number of elements. The unfilled portion of the output tensor retains whatever was previously in GPU memory. In multi-tenant inference deployments, this residual GPU memory may contain tensor data from other users' inference requests, constituting information disclosure. This vulnerability is fixed in 0.23.1rc0.
How severe is CVE-2026-53923?
CVE-2026-53923 has a CVSS score of 5.3/10 (MEDIUM severity). The EPSS model estimates a 0.28% probability of exploitation in the next 30 days.
How do I fix CVE-2026-53923?
Check the vendor references and advisories linked above for patched versions and mitigation guidance. You can also run a Strix scan to test if your systems are affected.

Are you affected by CVE-2026-53923?

Run a free Strix scan to check your systems for this vulnerability.

Scan your code now

Source: NVD / NIST