Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSError Exception using vulkan backend on rx 6600 xt #1351

Open
SageSystems opened this issue Feb 5, 2025 · 1 comment
Open

OSError Exception using vulkan backend on rx 6600 xt #1351

SageSystems opened this issue Feb 5, 2025 · 1 comment

Comments

@SageSystems
Copy link

SageSystems commented Feb 5, 2025

Describe the Issue
A clear and detailed description of what the issue is, and how to duplicate it (if applicable).

when using the vulkan backend with rx 6600 xt on windows 11, it crashes with this access violation. I haven't seen any issues like mine on this repo so I'm hoping it's user error i can correct by tweaking something. I have VBS enabled if that's an issue

Additional Information:
Please provide as much relevant information about your setup as possible, such as the Operating System, CPU, GPU, KoboldCpp Version, and relevant logs (helpful to include the launch params from the terminal output, flags and crash logs)
log ~~~C:\Users\Sage S\Downloads>koboldcpp.exe


Welcome to KoboldCpp - Version 1.82.4
For command line arguments, please refer to --help


Unable to detect VRAM, please set layers manually.
Auto Selected Vulkan Backend...

Initializing dynamic library: koboldcpp_vulkan.dll

Namespace(analyze='', benchmark=None, blasbatchsize=512, blasthreads=7, chatcompletionsadapter=None, config=None, contextsize=4096, debugmode=0, draftamount=8, draftgpulayers=999, draftgpusplit=None, draftmodel=None, failsafe=False, flashattention=False, forceversion=0, foreground=False, gpulayers=29, highpriority=False, hordeconfig=None, hordegenlen=0, hordekey='', hordemaxctx=0, hordemodelname='', hordeworkername='', host='', ignoremissing=False, launch=True, lora=None, mmproj=None, model='', model_param='C:/Users/Sage S/Downloads/DeepSeek-R1-Distill-Qwen-32B-Q4_K_M.gguf', moeexperts=-1, multiplayer=False, multiuser=1, noavx2=False, noblas=False, nocertify=False, nofastforward=False, nommap=False, nomodel=False, noshift=False, onready='', password=None, port=5001, port_param=5001, preloadstory=None, prompt='', promptlimit=100, quantkv=0, quiet=False, remotetunnel=False, ropeconfig=[0.0, 10000.0], sdclamped=0, sdclipg='', sdclipl='', sdconfig=None, sdlora='', sdloramult=1.0, sdmodel='', sdnotile=False, sdquant=False, sdt5xxl='', sdthreads=7, sdvae='', sdvaeauto=False, showgui=False, skiplauncher=False, smartcontext=False, ssl=None, tensor_split=None, threads=7, ttsgpu=False, ttsmodel='', ttsthreads=0, ttswavtokenizer='', unpack='', useclblast=None, usecpu=False, usecublas=None, usemlock=False, usemmap=False, usevulkan=[0], version=False, websearch=False, whispermodel='')

Loading Text Model: C:\Users\Sage S\Downloads\DeepSeek-R1-Distill-Qwen-32B-Q4_K_M.gguf

The reported GGUF Arch is: qwen2
Arch Category: 5


Identified as GGUF model: (ver 6)
Attempting to Load...

Using automatic RoPE scaling for GGUF. If the model has custom RoPE settings, they'll be used directly instead!
It means that the RoPE values written above will be replaced by the RoPE values indicated after loading.
System Info: AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | AMX_INT8 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
Warning, you are running Qwen2 without Flash Attention. If you observe incoherent output, try enabling it.
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 6600 XT (AMD proprietary driver) | uma: 0 | fp16: 1 | warp size: 64 | matrix cores: none
llama_model_load_from_file_impl: using device Vulkan0 (AMD Radeon RX 6600 XT) - 8176 MiB free
llama_model_loader: loaded meta data with 30 key-value pairs and 771 tensors from C:\Users\Sage S\Downloads\DeepSeek-R1-Distill-Qwen-32B-Q4_K_M.gguf (version GGUF V3 (latest))
print_info: file format = GGUF V3 (latest)
print_info: file type = unknown, may not work
print_info: file size = 18.48 GiB (4.85 BPW)
init_tokenizer: initializing tokenizer for type 2
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
load: special tokens cache size = 22
load: token to piece cache size = 0.9310 MB
print_info: arch = qwen2
print_info: vocab_only = 0
print_info: n_ctx_train = 131072
print_info: n_embd = 5120
print_info: n_layer = 64
print_info: n_head = 40
print_info: n_head_kv = 8
print_info: n_rot = 128
print_info: n_swa = 0
print_info: n_embd_head_k = 128
print_info: n_embd_head_v = 128
print_info: n_gqa = 5
print_info: n_embd_k_gqa = 1024
print_info: n_embd_v_gqa = 1024
print_info: f_norm_eps = 0.0e+00
print_info: f_norm_rms_eps = 1.0e-05
print_info: f_clamp_kqv = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale = 0.0e+00
print_info: n_ff = 27648
print_info: n_expert = 0
print_info: n_expert_used = 0
print_info: causal attn = 1
print_info: pooling type = 0
print_info: rope type = 2
print_info: rope scaling = linear
print_info: freq_base_train = 1000000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn = 131072
print_info: rope_finetuned = unknown
print_info: ssm_d_conv = 0
print_info: ssm_d_inner = 0
print_info: ssm_d_state = 0
print_info: ssm_dt_rank = 0
print_info: ssm_dt_b_c_rms = 0
print_info: model type = 32B
print_info: model params = 32.76 B
print_info: general.name = DeepSeek R1 Distill Qwen 32B
print_info: vocab type = BPE
print_info: n_vocab = 152064
print_info: n_merges = 151387
print_info: BOS token = 151646 '<|begin▁of▁sentence|>'
print_info: EOS token = 151643 '<|end▁of▁sentence|>'
print_info: EOT token = 151643 '<|end▁of▁sentence|>'
print_info: PAD token = 151643 '<|end▁of▁sentence|>'
print_info: LF token = 148848 'ÄĬ'
print_info: FIM PRE token = 151659 '<|fim_prefix|>'
print_info: FIM SUF token = 151661 '<|fim_suffix|>'
print_info: FIM MID token = 151660 '<|fim_middle|>'
print_info: FIM PAD token = 151662 '<|fim_pad|>'
print_info: FIM REP token = 151663 '<|repo_name|>'
print_info: FIM SEP token = 151664 '<|file_sep|>'
print_info: EOG token = 151643 '<|end▁of▁sentence|>'
print_info: EOG token = 151662 '<|fim_pad|>'
print_info: EOG token = 151663 '<|repo_name|>'
print_info: EOG token = 151664 '<|file_sep|>'
print_info: max token length = 256
Traceback (most recent call last):
File "koboldcpp.py", line 5669, in
main(parser.parse_args(),start_server=True)
File "koboldcpp.py", line 5212, in main
loadok = load_model(modelname)
File "koboldcpp.py", line 1083, in load_model
ret = handle.load_model(inputs)
OSError: exception: access violation reading 0x00000179EBE0FCE4
[10064] Failed to execute script 'koboldcpp' due to unhandled exception!~~~

@LostRuins
Copy link
Owner

Can you see if it still happens in the new version v1.83

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants