Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIP kernel errors #328

Open
userbox020 opened this issue Feb 7, 2024 · 3 comments
Open

HIP kernel errors #328

userbox020 opened this issue Feb 7, 2024 · 3 comments

Comments

@userbox020
Copy link

Im using rocm5.6 and installing enviroment with ooba one click install and im getting the follow error when loading models

Traceback (most recent call last):
  File "/home/mruserbox/Desktop/_OOBA/text-generation-webui/modules/ui_model_menu.py", line 213, in load_model_wrapper
    shared.model, shared.tokenizer = load_model(selected_model, loader)
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mruserbox/Desktop/_OOBA/text-generation-webui/modules/models.py", line 87, in load_model
    output = load_func_map[loader](model_name)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mruserbox/Desktop/_OOBA/text-generation-webui/modules/models.py", line 389, in ExLlamav2_HF_loader
    return Exllamav2HF.from_pretrained(model_name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mruserbox/Desktop/_OOBA/text-generation-webui/modules/exllamav2_hf.py", line 170, in from_pretrained
    return Exllamav2HF(config)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/mruserbox/Desktop/_OOBA/text-generation-webui/modules/exllamav2_hf.py", line 44, in __init__
    self.ex_model.load(split)
  File "/home/mruserbox/Desktop/_OOBA/text-generation-webui/installer_files/env/lib/python3.11/site-packages/exllamav2/model.py", line 248, in load
    for item in f: return item
  File "/home/mruserbox/Desktop/_OOBA/text-generation-webui/installer_files/env/lib/python3.11/site-packages/exllamav2/model.py", line 266, in load_gen
    module.load()
  File "/home/mruserbox/Desktop/_OOBA/text-generation-webui/installer_files/env/lib/python3.11/site-packages/exllamav2/attn.py", line 189, in load
    self.q_proj.load()
  File "/home/mruserbox/Desktop/_OOBA/text-generation-webui/installer_files/env/lib/python3.11/site-packages/exllamav2/linear.py", line 45, in load
    if w is None: w = self.load_weight()
                      ^^^^^^^^^^^^^^^^^^
  File "/home/mruserbox/Desktop/_OOBA/text-generation-webui/installer_files/env/lib/python3.11/site-packages/exllamav2/module.py", line 97, in load_weight
    qtensors["q_perm"] = torch.argsort(qtensors["q_invperm"]).to(torch.int)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: HIP error: the operation cannot be performed in the present state
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing HIP_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.

However can run llamacpp models on same gpu and environment without any errors

@SinanAkkoyun
Copy link
Contributor

I get a similar error when using the prebuilt rocm wheel exllamav2-0.0.13.post1+rocm5.6-cp311-cp311-linux_x86_64.whl

ROCR_VISIBLE_DEVICES=1 python examples/chat.py -m ../../../models/exl2/tinyllama-1B-4.0bpw -mode llama                                             (exl2) 
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = "",
        LC_ALL = (unset),
        LC_TIME = "en_DE.UTF-8",
        LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = "",
        LC_ALL = (unset),
        LC_TIME = "en_DE.UTF-8",
        LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
 -- Model: ../../../models/exl2/tinyllama-1B-4.0bpw
 -- Options: []
 -- Loading model...
Traceback (most recent call last):
  File "/home/sinan/ml/llm/inference/exl2/exllamav2/examples/chat.py", line 87, in <module>
    model, tokenizer = model_init.init(args, allow_auto_split = True)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sinan/.conda/envs/exl2/lib/python3.11/site-packages/exllamav2/model_init.py", line 101, in init
    model.load(split)
  File "/home/sinan/.conda/envs/exl2/lib/python3.11/site-packages/exllamav2/model.py", line 248, in load
    for item in f: return item
  File "/home/sinan/.conda/envs/exl2/lib/python3.11/site-packages/exllamav2/model.py", line 266, in load_gen
    module.load()
  File "/home/sinan/.conda/envs/exl2/lib/python3.11/site-packages/exllamav2/attn.py", line 189, in load
    self.q_proj.load()
  File "/home/sinan/.conda/envs/exl2/lib/python3.11/site-packages/exllamav2/linear.py", line 45, in load
    if w is None: w = self.load_weight()
                      ^^^^^^^^^^^^^^^^^^
  File "/home/sinan/.conda/envs/exl2/lib/python3.11/site-packages/exllamav2/module.py", line 97, in load_weight
    qtensors["q_perm"] = torch.argsort(qtensors["q_invperm"]).to(torch.int)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: HIP error: invalid device function
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing HIP_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.

@userbox020
Copy link
Author

@SinanAkkoyun I think new mesa drivers 24.1 solve the issue, havent check yet

@turboderp
Copy link
Member

Any updates?

@turboderp turboderp reopened this Jun 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants