You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description:
I'm experiencing random assertion failures and segmentation faults when streaming responses from a fine-tuned Llama3.1 70B GGUF model. The error occurs in the GGML matrix multiplication validation.
Sometimes, it gives this GGML error, but most of the times, it just gives Segmentation fault (core dumped) and my pipeline crashes.
Environment:
llama_cpp_python version: 0.3.4
GPU: NVIDIA A40
Model: Custom fine-tuned Llama3.1 70B GGUF (originally fine-tuned with Unsloth at 4k context, running at 16k n_ctx)
@shamitv
It is possible, but so far I've tested it with 0.2.9 and 0.3.4.
I just want to know what could be the reasons behind this error.
I've been using llama-cpp-python in many projects and for a long time, but it just occurs in one project where i am getting the output in a stream and calling the model again and again very fast (my use case is to get output from llama 70B as quick as possible.)
AleefBilal
changed the title
Random GGML assertion failure and segfault during streaming with fine-tuned 70B model
Segmentation fault (core dumped) appearing randomly
May 9, 2025
Description:
I'm experiencing random assertion failures and segmentation faults when streaming responses from a fine-tuned Llama3.1 70B GGUF model. The error occurs in the GGML matrix multiplication validation.
Sometimes, it gives this GGML error, but most of the times, it just gives
Segmentation fault (core dumped)
and my pipeline crashes.Environment:
llama_cpp_python
version: 0.3.4n_ctx
)Error Log:
Reproduction Steps:
Additional Context:
llama.cpp
's convert scriptDebugging Attempts:
n_ctx
values (4096, 8192, 16384)llama.cpp
's main exampleSystem Info:
Request:
Could you help investigate:
The text was updated successfully, but these errors were encountered: