You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the Issue
A clear and detailed description of what the issue is, and how to duplicate it (if applicable).
when using the vulkan backend with rx 6600 xt on windows 11, it crashes with this access violation. I haven't seen any issues like mine on this repo so I'm hoping it's user error i can correct by tweaking something. I have VBS enabled if that's an issue
Additional Information:
Please provide as much relevant information about your setup as possible, such as the Operating System, CPU, GPU, KoboldCpp Version, and relevant logs (helpful to include the launch params from the terminal output, flags and crash logs)
log ~~~C:\Users\Sage S\Downloads>koboldcpp.exe
Welcome to KoboldCpp - Version 1.82.4
For command line arguments, please refer to --help
Unable to detect VRAM, please set layers manually.
Auto Selected Vulkan Backend...
Describe the Issue
A clear and detailed description of what the issue is, and how to duplicate it (if applicable).
when using the vulkan backend with rx 6600 xt on windows 11, it crashes with this access violation. I haven't seen any issues like mine on this repo so I'm hoping it's user error i can correct by tweaking something. I have VBS enabled if that's an issue
Additional Information:
Please provide as much relevant information about your setup as possible, such as the Operating System, CPU, GPU, KoboldCpp Version, and relevant logs (helpful to include the launch params from the terminal output, flags and crash logs)
log ~~~C:\Users\Sage S\Downloads>koboldcpp.exe
Welcome to KoboldCpp - Version 1.82.4
For command line arguments, please refer to --help
Unable to detect VRAM, please set layers manually.
Auto Selected Vulkan Backend...
Initializing dynamic library: koboldcpp_vulkan.dll
Namespace(analyze='', benchmark=None, blasbatchsize=512, blasthreads=7, chatcompletionsadapter=None, config=None, contextsize=4096, debugmode=0, draftamount=8, draftgpulayers=999, draftgpusplit=None, draftmodel=None, failsafe=False, flashattention=False, forceversion=0, foreground=False, gpulayers=29, highpriority=False, hordeconfig=None, hordegenlen=0, hordekey='', hordemaxctx=0, hordemodelname='', hordeworkername='', host='', ignoremissing=False, launch=True, lora=None, mmproj=None, model='', model_param='C:/Users/Sage S/Downloads/DeepSeek-R1-Distill-Qwen-32B-Q4_K_M.gguf', moeexperts=-1, multiplayer=False, multiuser=1, noavx2=False, noblas=False, nocertify=False, nofastforward=False, nommap=False, nomodel=False, noshift=False, onready='', password=None, port=5001, port_param=5001, preloadstory=None, prompt='', promptlimit=100, quantkv=0, quiet=False, remotetunnel=False, ropeconfig=[0.0, 10000.0], sdclamped=0, sdclipg='', sdclipl='', sdconfig=None, sdlora='', sdloramult=1.0, sdmodel='', sdnotile=False, sdquant=False, sdt5xxl='', sdthreads=7, sdvae='', sdvaeauto=False, showgui=False, skiplauncher=False, smartcontext=False, ssl=None, tensor_split=None, threads=7, ttsgpu=False, ttsmodel='', ttsthreads=0, ttswavtokenizer='', unpack='', useclblast=None, usecpu=False, usecublas=None, usemlock=False, usemmap=False, usevulkan=[0], version=False, websearch=False, whispermodel='')
Loading Text Model: C:\Users\Sage S\Downloads\DeepSeek-R1-Distill-Qwen-32B-Q4_K_M.gguf
The reported GGUF Arch is: qwen2
Arch Category: 5
Identified as GGUF model: (ver 6)
Attempting to Load...
Using automatic RoPE scaling for GGUF. If the model has custom RoPE settings, they'll be used directly instead!
It means that the RoPE values written above will be replaced by the RoPE values indicated after loading.
System Info: AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | AMX_INT8 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
Warning, you are running Qwen2 without Flash Attention. If you observe incoherent output, try enabling it.
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 6600 XT (AMD proprietary driver) | uma: 0 | fp16: 1 | warp size: 64 | matrix cores: none
llama_model_load_from_file_impl: using device Vulkan0 (AMD Radeon RX 6600 XT) - 8176 MiB free
llama_model_loader: loaded meta data with 30 key-value pairs and 771 tensors from C:\Users\Sage S\Downloads\DeepSeek-R1-Distill-Qwen-32B-Q4_K_M.gguf (version GGUF V3 (latest))
print_info: file format = GGUF V3 (latest)
print_info: file type = unknown, may not work
print_info: file size = 18.48 GiB (4.85 BPW)
init_tokenizer: initializing tokenizer for type 2
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
load: special tokens cache size = 22
load: token to piece cache size = 0.9310 MB
print_info: arch = qwen2
print_info: vocab_only = 0
print_info: n_ctx_train = 131072
print_info: n_embd = 5120
print_info: n_layer = 64
print_info: n_head = 40
print_info: n_head_kv = 8
print_info: n_rot = 128
print_info: n_swa = 0
print_info: n_embd_head_k = 128
print_info: n_embd_head_v = 128
print_info: n_gqa = 5
print_info: n_embd_k_gqa = 1024
print_info: n_embd_v_gqa = 1024
print_info: f_norm_eps = 0.0e+00
print_info: f_norm_rms_eps = 1.0e-05
print_info: f_clamp_kqv = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale = 0.0e+00
print_info: n_ff = 27648
print_info: n_expert = 0
print_info: n_expert_used = 0
print_info: causal attn = 1
print_info: pooling type = 0
print_info: rope type = 2
print_info: rope scaling = linear
print_info: freq_base_train = 1000000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn = 131072
print_info: rope_finetuned = unknown
print_info: ssm_d_conv = 0
print_info: ssm_d_inner = 0
print_info: ssm_d_state = 0
print_info: ssm_dt_rank = 0
print_info: ssm_dt_b_c_rms = 0
print_info: model type = 32B
print_info: model params = 32.76 B
print_info: general.name = DeepSeek R1 Distill Qwen 32B
print_info: vocab type = BPE
print_info: n_vocab = 152064
print_info: n_merges = 151387
print_info: BOS token = 151646 '<|begin▁of▁sentence|>'
print_info: EOS token = 151643 '<|end▁of▁sentence|>'
print_info: EOT token = 151643 '<|end▁of▁sentence|>'
print_info: PAD token = 151643 '<|end▁of▁sentence|>'
print_info: LF token = 148848 'ÄĬ'
print_info: FIM PRE token = 151659 '<|fim_prefix|>'
print_info: FIM SUF token = 151661 '<|fim_suffix|>'
print_info: FIM MID token = 151660 '<|fim_middle|>'
print_info: FIM PAD token = 151662 '<|fim_pad|>'
print_info: FIM REP token = 151663 '<|repo_name|>'
print_info: FIM SEP token = 151664 '<|file_sep|>'
print_info: EOG token = 151643 '<|end▁of▁sentence|>'
print_info: EOG token = 151662 '<|fim_pad|>'
print_info: EOG token = 151663 '<|repo_name|>'
print_info: EOG token = 151664 '<|file_sep|>'
print_info: max token length = 256
Traceback (most recent call last):
File "koboldcpp.py", line 5669, in
main(parser.parse_args(),start_server=True)
File "koboldcpp.py", line 5212, in main
loadok = load_model(modelname)
File "koboldcpp.py", line 1083, in load_model
ret = handle.load_model(inputs)
OSError: exception: access violation reading 0x00000179EBE0FCE4
[10064] Failed to execute script 'koboldcpp' due to unhandled exception!~~~
The text was updated successfully, but these errors were encountered: