Replies: 1 comment
-
I'm seeing these errors as well despite having plenty of available memory. In my case I was able to narrow it down to a specific situation in which two threads both want to use vulkan at the same time. It throws a segmentation fault whenever whisper.cpp tries to run while llama.cpp is still decoding |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Are there any limitations to the size and type of the models supported by Vulkan on Windows ?
Context:
I am unable to load a Llama3-8b-Q4_K_M.gguf model with Vulkan on a NVIDIA 4070M (8gb) and also AMD 780M with 24Gb available. The model loads fine on CUDA and HIP, but fails on Vulkan with:
load_tensors: layer 29 assigned to device Vulkan0
load_tensors: layer 30 assigned to device Vulkan0
load_tensors: layer 31 assigned to device Vulkan0
load_tensors: layer 32 assigned to device Vulkan0
load_tensors: tensor 'token_embd.weight' (q4_K) (and 0 others) cannot be used with preferred buffer type CPU_AARCH64, using CPU instead
ggml_vulkan: Device memory allocation of size 392101888 failed.
ggml_vulkan: vk::Device::allocateMemory: ErrorOutOfDeviceMemory
llama_model_load: error loading model: unable to allocate Vulkan0 buffer
llama_model_load_from_file_impl: failed to load model
Vulkan info returns this for the NVIDIA 4070:
memoryHeaps: count = 2
memoryHeaps[0]:
size = 8334082048 (0x1f0c00000) (7.76 GiB)
budget = 7528775680 (0x1c0c00000) (7.01 GiB)
usage = 0 (0x00000000) (0.00 B)
flags: count = 1
MEMORY_HEAP_DEVICE_LOCAL_BIT
memoryHeaps[1]:
size = 25334833152 (0x5e612e000) (23.59 GiB)
budget = 24529528832 (0x5b612e800) (22.84 GiB)
usage = 278528 (0x00044000) (272.00 KiB)
flags:
None
and this for AMD 780M:
memoryHeaps: count = 3
memoryHeaps[0]:
size = 268435456 (0x10000000) (256.00 MiB)
budget = 255013680 (0x0f333330) (243.20 MiB)
usage = 0 (0x00000000) (0.00 B)
flags: count = 2
MEMORY_HEAP_DEVICE_LOCAL_BIT
MEMORY_HEAP_MULTI_INSTANCE_BIT
memoryHeaps[1]:
size = 25066209280 (0x5d6100000) (23.34 GiB)
budget = 23812898816 (0x58b5c0000) (22.18 GiB)
usage = 0 (0x00000000) (0.00 B)
flags:
None
memoryHeaps[2]:
size = 268435456 (0x10000000) (256.00 MiB)
budget = 255013680 (0x0f333330) (243.20 MiB)
usage = 0 (0x00000000) (0.00 B)
flags: count = 2
MEMORY_HEAP_DEVICE_LOCAL_BIT
MEMORY_HEAP_MULTI_INSTANCE_BIT
Any insights appreciated ...
Beta Was this translation helpful? Give feedback.
All reactions