About GPU memory usage 

Dear author，

I'm trying to run LongLM on a single  A10 with 24G memory,  I have tried 'meta-llama/Llama-2-7b-chat-hf' and failed with out of CUDA memory  error(attached). 
![20240726105026](https://github.com/user-attachments/assets/2f788fc4-a005-49be-918c-4daee54967af)


I realized that your example.py is running on 4 RTX3090s, 24GB memory each. So I wonder whether an A10 is something worth a shot or not even close?

I also want to ask whether compressed models, for example Unsloth's model, can be used in LongLM or not? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

About GPU memory usage #45

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

About GPU memory usage #45

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions