Vulkan memory limitations on Windows #12008

luiznpi · 2025-02-21T18:44:48Z

luiznpi
Feb 21, 2025

Are there any limitations to the size and type of the models supported by Vulkan on Windows ?

Context:

I am unable to load a Llama3-8b-Q4_K_M.gguf model with Vulkan on a NVIDIA 4070M (8gb) and also AMD 780M with 24Gb available. The model loads fine on CUDA and HIP, but fails on Vulkan with:

load_tensors: layer 29 assigned to device Vulkan0
load_tensors: layer 30 assigned to device Vulkan0
load_tensors: layer 31 assigned to device Vulkan0
load_tensors: layer 32 assigned to device Vulkan0
load_tensors: tensor 'token_embd.weight' (q4_K) (and 0 others) cannot be used with preferred buffer type CPU_AARCH64, using CPU instead
ggml_vulkan: Device memory allocation of size 392101888 failed.
ggml_vulkan: vk::Device::allocateMemory: ErrorOutOfDeviceMemory
llama_model_load: error loading model: unable to allocate Vulkan0 buffer
llama_model_load_from_file_impl: failed to load model

Vulkan info returns this for the NVIDIA 4070:

memoryHeaps: count = 2
memoryHeaps[0]:
size = 8334082048 (0x1f0c00000) (7.76 GiB)
budget = 7528775680 (0x1c0c00000) (7.01 GiB)
usage = 0 (0x00000000) (0.00 B)
flags: count = 1
MEMORY_HEAP_DEVICE_LOCAL_BIT
memoryHeaps[1]:
size = 25334833152 (0x5e612e000) (23.59 GiB)
budget = 24529528832 (0x5b612e800) (22.84 GiB)
usage = 278528 (0x00044000) (272.00 KiB)
flags:
None

and this for AMD 780M:

memoryHeaps: count = 3
memoryHeaps[0]:
size = 268435456 (0x10000000) (256.00 MiB)
budget = 255013680 (0x0f333330) (243.20 MiB)
usage = 0 (0x00000000) (0.00 B)
flags: count = 2
MEMORY_HEAP_DEVICE_LOCAL_BIT
MEMORY_HEAP_MULTI_INSTANCE_BIT
memoryHeaps[1]:
size = 25066209280 (0x5d6100000) (23.34 GiB)
budget = 23812898816 (0x58b5c0000) (22.18 GiB)
usage = 0 (0x00000000) (0.00 B)
flags:
None
memoryHeaps[2]:
size = 268435456 (0x10000000) (256.00 MiB)
budget = 255013680 (0x0f333330) (243.20 MiB)
usage = 0 (0x00000000) (0.00 B)
flags: count = 2
MEMORY_HEAP_DEVICE_LOCAL_BIT
MEMORY_HEAP_MULTI_INSTANCE_BIT

Any insights appreciated ...

tsurai · 2025-02-22T11:14:38Z

tsurai
Feb 22, 2025

I'm seeing these errors as well despite having plenty of available memory. In my case I was able to narrow it down to a specific situation in which two threads both want to use vulkan at the same time. It throws a segmentation fault whenever whisper.cpp tries to run while llama.cpp is still decoding

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vulkan memory limitations on Windows #12008

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Vulkan memory limitations on Windows #12008

luiznpi Feb 21, 2025

Replies: 1 comment

tsurai Feb 22, 2025

luiznpi
Feb 21, 2025

tsurai
Feb 22, 2025