Skip to content

llama3 is not working. #36

Open
Open
@rayjang

Description

@rayjang

I followed your direction like the below to apply selfextend to llama3
"""
[04/19/2024]:💡 We added the support for LLama-3 with transformers==4.40. To use it with transformers==4.40, you may change the file name of Llama_4_40.py to Llama.py to replace the existing patch file.
"""

I got this error.
"""

Exception Traceback (most recent call last)
Cell In[12], line 4
2 group_size = 5
3 window_size = 1024
----> 4 SelfExtend.apply(model, group_size, window_size, enable_flash_attention=True)#, flash_attention_impl='flash_attn')
5 model.eval()

File /home/ubuntu/reports/SelfExtend.py:109, in apply(loaded_model, group_size, window_size, enable_flash_attention, scale_base, flash_attention_impl)
107 print("Using triton flash self_extend!!")
108 if (not modifed):
--> 109 raise Exception(f"Failed to modify the attention method of {arch_name}")
110 else:
111 raise Exception(f"Need to set the flash_attention_impl to 'flash_attn' or 'triton'.")

Exception: Failed to modify the attention method of LlamaForCausalLM
"""

how to fix it?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions