OSError Exception using vulkan backend on rx 6600 xt #1351

SageSystems · 2025-02-05T07:51:37Z

Describe the Issue
A clear and detailed description of what the issue is, and how to duplicate it (if applicable).

when using the vulkan backend with rx 6600 xt on windows 11, it crashes with this access violation. I haven't seen any issues like mine on this repo so I'm hoping it's user error i can correct by tweaking something. I have VBS enabled if that's an issue

Additional Information:
Please provide as much relevant information about your setup as possible, such as the Operating System, CPU, GPU, KoboldCpp Version, and relevant logs (helpful to include the launch params from the terminal output, flags and crash logs)
log ~~~C:\Users\Sage S\Downloads>koboldcpp.exe

Welcome to KoboldCpp - Version 1.82.4
For command line arguments, please refer to --help

Unable to detect VRAM, please set layers manually.
Auto Selected Vulkan Backend...

Initializing dynamic library: koboldcpp_vulkan.dll

Namespace(analyze='', benchmark=None, blasbatchsize=512, blasthreads=7, chatcompletionsadapter=None, config=None, contextsize=4096, debugmode=0, draftamount=8, draftgpulayers=999, draftgpusplit=None, draftmodel=None, failsafe=False, flashattention=False, forceversion=0, foreground=False, gpulayers=29, highpriority=False, hordeconfig=None, hordegenlen=0, hordekey='', hordemaxctx=0, hordemodelname='', hordeworkername='', host='', ignoremissing=False, launch=True, lora=None, mmproj=None, model='', model_param='C:/Users/Sage S/Downloads/DeepSeek-R1-Distill-Qwen-32B-Q4_K_M.gguf', moeexperts=-1, multiplayer=False, multiuser=1, noavx2=False, noblas=False, nocertify=False, nofastforward=False, nommap=False, nomodel=False, noshift=False, onready='', password=None, port=5001, port_param=5001, preloadstory=None, prompt='', promptlimit=100, quantkv=0, quiet=False, remotetunnel=False, ropeconfig=[0.0, 10000.0], sdclamped=0, sdclipg='', sdclipl='', sdconfig=None, sdlora='', sdloramult=1.0, sdmodel='', sdnotile=False, sdquant=False, sdt5xxl='', sdthreads=7, sdvae='', sdvaeauto=False, showgui=False, skiplauncher=False, smartcontext=False, ssl=None, tensor_split=None, threads=7, ttsgpu=False, ttsmodel='', ttsthreads=0, ttswavtokenizer='', unpack='', useclblast=None, usecpu=False, usecublas=None, usemlock=False, usemmap=False, usevulkan=[0], version=False, websearch=False, whispermodel='')

Loading Text Model: C:\Users\Sage S\Downloads\DeepSeek-R1-Distill-Qwen-32B-Q4_K_M.gguf

The reported GGUF Arch is: qwen2
Arch Category: 5

Identified as GGUF model: (ver 6)
Attempting to Load...

Using automatic RoPE scaling for GGUF. If the model has custom RoPE settings, they'll be used directly instead!
It means that the RoPE values written above will be replaced by the RoPE values indicated after loading.
System Info: AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | AMX_INT8 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
Warning, you are running Qwen2 without Flash Attention. If you observe incoherent output, try enabling it.
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 6600 XT (AMD proprietary driver) | uma: 0 | fp16: 1 | warp size: 64 | matrix cores: none
llama_model_load_from_file_impl: using device Vulkan0 (AMD Radeon RX 6600 XT) - 8176 MiB free
llama_model_loader: loaded meta data with 30 key-value pairs and 771 tensors from C:\Users\Sage S\Downloads\DeepSeek-R1-Distill-Qwen-32B-Q4_K_M.gguf (version GGUF V3 (latest))
print_info: file format = GGUF V3 (latest)
print_info: file type = unknown, may not work
print_info: file size = 18.48 GiB (4.85 BPW)
init_tokenizer: initializing tokenizer for type 2
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
load: special tokens cache size = 22
load: token to piece cache size = 0.9310 MB
print_info: arch = qwen2
print_info: vocab_only = 0
print_info: n_ctx_train = 131072
print_info: n_embd = 5120
print_info: n_layer = 64
print_info: n_head = 40
print_info: n_head_kv = 8
print_info: n_rot = 128
print_info: n_swa = 0
print_info: n_embd_head_k = 128
print_info: n_embd_head_v = 128
print_info: n_gqa = 5
print_info: n_embd_k_gqa = 1024
print_info: n_embd_v_gqa = 1024
print_info: f_norm_eps = 0.0e+00
print_info: f_norm_rms_eps = 1.0e-05
print_info: f_clamp_kqv = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale = 0.0e+00
print_info: n_ff = 27648
print_info: n_expert = 0
print_info: n_expert_used = 0
print_info: causal attn = 1
print_info: pooling type = 0
print_info: rope type = 2
print_info: rope scaling = linear
print_info: freq_base_train = 1000000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn = 131072
print_info: rope_finetuned = unknown
print_info: ssm_d_conv = 0
print_info: ssm_d_inner = 0
print_info: ssm_d_state = 0
print_info: ssm_dt_rank = 0
print_info: ssm_dt_b_c_rms = 0
print_info: model type = 32B
print_info: model params = 32.76 B
print_info: general.name = DeepSeek R1 Distill Qwen 32B
print_info: vocab type = BPE
print_info: n_vocab = 152064
print_info: n_merges = 151387
print_info: BOS token = 151646 '<∩╜£beginΓûüofΓûüsentence∩╜£>'
print_info: EOS token = 151643 '<∩╜£endΓûüofΓûüsentence∩╜£>'
print_info: EOT token = 151643 '<∩╜£endΓûüofΓûüsentence∩╜£>'
print_info: PAD token = 151643 '<∩╜£endΓûüofΓûüsentence∩╜£>'
print_info: LF token = 148848 '├ä─¼'
print_info: FIM PRE token = 151659 '<|fim_prefix|>'
print_info: FIM SUF token = 151661 '<|fim_suffix|>'
print_info: FIM MID token = 151660 '<|fim_middle|>'
print_info: FIM PAD token = 151662 '<|fim_pad|>'
print_info: FIM REP token = 151663 '<|repo_name|>'
print_info: FIM SEP token = 151664 '<|file_sep|>'
print_info: EOG token = 151643 '<∩╜£endΓûüofΓûüsentence∩╜£>'
print_info: EOG token = 151662 '<|fim_pad|>'
print_info: EOG token = 151663 '<|repo_name|>'
print_info: EOG token = 151664 '<|file_sep|>'
print_info: max token length = 256
Traceback (most recent call last):
File "koboldcpp.py", line 5669, in
main(parser.parse_args(),start_server=True)
File "koboldcpp.py", line 5212, in main
loadok = load_model(modelname)
File "koboldcpp.py", line 1083, in load_model
ret = handle.load_model(inputs)
OSError: exception: access violation reading 0x00000179EBE0FCE4
[10064] Failed to execute script 'koboldcpp' due to unhandled exception!~~~

LostRuins · 2025-02-08T02:24:00Z

Can you see if it still happens in the new version v1.83

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OSError Exception using vulkan backend on rx 6600 xt #1351

OSError Exception using vulkan backend on rx 6600 xt #1351

SageSystems commented Feb 5, 2025 •

edited

Loading

LostRuins commented Feb 8, 2025

OSError Exception using vulkan backend on rx 6600 xt #1351

OSError Exception using vulkan backend on rx 6600 xt #1351

Comments

SageSystems commented Feb 5, 2025 • edited Loading

Initializing dynamic library: koboldcpp_vulkan.dll

Identified as GGUF model: (ver 6) Attempting to Load...

LostRuins commented Feb 8, 2025

SageSystems commented Feb 5, 2025 •

edited

Loading

Identified as GGUF model: (ver 6)
Attempting to Load...