HIP kernel errors #328

userbox020 · 2024-02-07T23:15:12Z

Im using rocm5.6 and installing enviroment with ooba one click install and im getting the follow error when loading models

Traceback (most recent call last):
  File "/home/mruserbox/Desktop/_OOBA/text-generation-webui/modules/ui_model_menu.py", line 213, in load_model_wrapper
    shared.model, shared.tokenizer = load_model(selected_model, loader)
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mruserbox/Desktop/_OOBA/text-generation-webui/modules/models.py", line 87, in load_model
    output = load_func_map[loader](model_name)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mruserbox/Desktop/_OOBA/text-generation-webui/modules/models.py", line 389, in ExLlamav2_HF_loader
    return Exllamav2HF.from_pretrained(model_name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mruserbox/Desktop/_OOBA/text-generation-webui/modules/exllamav2_hf.py", line 170, in from_pretrained
    return Exllamav2HF(config)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/mruserbox/Desktop/_OOBA/text-generation-webui/modules/exllamav2_hf.py", line 44, in __init__
    self.ex_model.load(split)
  File "/home/mruserbox/Desktop/_OOBA/text-generation-webui/installer_files/env/lib/python3.11/site-packages/exllamav2/model.py", line 248, in load
    for item in f: return item
  File "/home/mruserbox/Desktop/_OOBA/text-generation-webui/installer_files/env/lib/python3.11/site-packages/exllamav2/model.py", line 266, in load_gen
    module.load()
  File "/home/mruserbox/Desktop/_OOBA/text-generation-webui/installer_files/env/lib/python3.11/site-packages/exllamav2/attn.py", line 189, in load
    self.q_proj.load()
  File "/home/mruserbox/Desktop/_OOBA/text-generation-webui/installer_files/env/lib/python3.11/site-packages/exllamav2/linear.py", line 45, in load
    if w is None: w = self.load_weight()
                      ^^^^^^^^^^^^^^^^^^
  File "/home/mruserbox/Desktop/_OOBA/text-generation-webui/installer_files/env/lib/python3.11/site-packages/exllamav2/module.py", line 97, in load_weight
    qtensors["q_perm"] = torch.argsort(qtensors["q_invperm"]).to(torch.int)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: HIP error: the operation cannot be performed in the present state
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing HIP_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.

However can run llamacpp models on same gpu and environment without any errors

The text was updated successfully, but these errors were encountered:

SinanAkkoyun · 2024-02-19T00:57:37Z

I get a similar error when using the prebuilt rocm wheel exllamav2-0.0.13.post1+rocm5.6-cp311-cp311-linux_x86_64.whl

ROCR_VISIBLE_DEVICES=1 python examples/chat.py -m ../../../models/exl2/tinyllama-1B-4.0bpw -mode llama                                             (exl2) 
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = "",
        LC_ALL = (unset),
        LC_TIME = "en_DE.UTF-8",
        LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = "",
        LC_ALL = (unset),
        LC_TIME = "en_DE.UTF-8",
        LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
 -- Model: ../../../models/exl2/tinyllama-1B-4.0bpw
 -- Options: []
 -- Loading model...
Traceback (most recent call last):
  File "/home/sinan/ml/llm/inference/exl2/exllamav2/examples/chat.py", line 87, in <module>
    model, tokenizer = model_init.init(args, allow_auto_split = True)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sinan/.conda/envs/exl2/lib/python3.11/site-packages/exllamav2/model_init.py", line 101, in init
    model.load(split)
  File "/home/sinan/.conda/envs/exl2/lib/python3.11/site-packages/exllamav2/model.py", line 248, in load
    for item in f: return item
  File "/home/sinan/.conda/envs/exl2/lib/python3.11/site-packages/exllamav2/model.py", line 266, in load_gen
    module.load()
  File "/home/sinan/.conda/envs/exl2/lib/python3.11/site-packages/exllamav2/attn.py", line 189, in load
    self.q_proj.load()
  File "/home/sinan/.conda/envs/exl2/lib/python3.11/site-packages/exllamav2/linear.py", line 45, in load
    if w is None: w = self.load_weight()
                      ^^^^^^^^^^^^^^^^^^
  File "/home/sinan/.conda/envs/exl2/lib/python3.11/site-packages/exllamav2/module.py", line 97, in load_weight
    qtensors["q_perm"] = torch.argsort(qtensors["q_invperm"]).to(torch.int)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: HIP error: invalid device function
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing HIP_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.

userbox020 · 2024-05-03T20:52:50Z

@SinanAkkoyun I think new mesa drivers 24.1 solve the issue, havent check yet

turboderp · 2024-06-17T13:26:20Z

Any updates?

turboderp closed this as completed Jun 17, 2024

turboderp reopened this Jun 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HIP kernel errors #328

HIP kernel errors #328

userbox020 commented Feb 7, 2024

SinanAkkoyun commented Feb 19, 2024

userbox020 commented May 3, 2024

turboderp commented Jun 17, 2024

HIP kernel errors #328

HIP kernel errors #328

Comments

userbox020 commented Feb 7, 2024

SinanAkkoyun commented Feb 19, 2024

userbox020 commented May 3, 2024

turboderp commented Jun 17, 2024