PC crashes when split-mode is set to layer but everything works when set to row #12516

ammorcos · 2025-03-22T18:57:50Z

ammorcos
Mar 22, 2025

Hello
I am trying to run mistralai_Mistral-Small-3.1-24B-Instruct-2503-IQ4_XS.gguf in my windows 10 PC with 2x 2070Supers. If I set -sm layer the model loads but crashes with the first prompt. if I set it to row, it works, but seems to be slower that it should be if using the GPUs. Can someone please help me understand what I am doing wrong, or why it will not work. The first few words outputted are super fast in layer mode before the system crashes

Thank you

The output is the same for the most part with a handful of differences.
For row:
.
.
load_tensors: offloading 40 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 41/41 layers to GPU
load_tensors: CUDA0_Split model buffer size = 5929.22 MiB
load_tensors: CUDA1_Split model buffer size = 5889.53 MiB
load_tensors: CUDA0 model buffer size = 0.82 MiB
load_tensors: CUDA1 model buffer size = 0.76 MiB
load_tensors: CPU_Mapped model buffer size = 340.00 MiB
'
'
llama_context: KV self size = 640.00 MiB, K (f16): 320.00 MiB, V (f16): 320.00 MiB
llama_context: CUDA0 compute buffer size = 300.00 MiB
llama_context: CUDA1 compute buffer size = 300.00 MiB
llama_context: CUDA_Host compute buffer size = 18.01 MiB
llama_context: graph nodes = 1366
llama_context: graph splits = 3
.
.

For layer:
.
.
load_tensors: offloading 40 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 41/41 layers to GPU
load_tensors: CUDA0 model buffer size = 5930.04 MiB
load_tensors: CUDA1 model buffer size = 5890.29 MiB
load_tensors: CPU_Mapped model buffer size = 340.00 MiB
.
.
llama_context: KV self size = 640.00 MiB, K (f16): 320.00 MiB, V (f16): 320.00 MiB
llama_context: pipeline parallelism enabled (n_copies=4)
llama_context: CUDA0 compute buffer size = 364.01 MiB
llama_context: CUDA1 compute buffer size = 364.02 MiB
llama_context: CUDA_Host compute buffer size = 42.02 MiB
llama_context: graph nodes = 1366
llama_context: graph splits = 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PC crashes when split-mode is set to layer but everything works when set to row #12516

{{title}}

Replies: 0 comments

Select a reply

PC crashes when split-mode is set to layer but everything works when set to row #12516

ammorcos Mar 22, 2025

Replies: 0 comments

ammorcos
Mar 22, 2025