Open
Description
Dear author,
I'm trying to run LongLM on a single A10 with 24G memory, I have tried 'meta-llama/Llama-2-7b-chat-hf' and failed with out of CUDA memory error(attached).
I realized that your example.py is running on 4 RTX3090s, 24GB memory each. So I wonder whether an A10 is something worth a shot or not even close?
I also want to ask whether compressed models, for example Unsloth's model, can be used in LongLM or not?
Metadata
Metadata
Assignees
Labels
No labels