Replies: 6 comments 9 replies
-
Not any time soon. The code is written to work extremely fast on CPUs, specifically x86_64 and ARM64 systems, and not GPUs. The M1/M2 GPUs probably do not have enough memory to load the models, but check out FB's original LLaMM inference code to run FB's inference engine and DeepSpeed for techniques to distribute models across both CPU and GPU memory. This can get very involved. |
Beta Was this translation helpful? Give feedback.
-
I haven't had an Apple PC since 1987. Run LLaMA inference on Apple Silicon GPUs. |
Beta Was this translation helpful? Give feedback.
-
Thank everyone for the answers. As I realized from the Philip Turner answer, there is no reason to run the model on M1/M2 GPU. In fact, the CPU inference speed in my case is satisfactory. Moreover, I read a lot about garbage outputs. I managed to get pretty good results, at least similar to GPT 3 on the 7B model. Also I realized that there will be no way to fine tune the model as it requires too many resources to accomplish this task. My goal is actually to try this model for the paraphrasing tasks. I've been trying various prompts, but unlike summarization, the model seemingly can't realize this task out of the box. On the other hand, it has enough potential to do that, and, given that it could be run on CPU, it may be used in the online tool. I suppose it will outperform the T5 model sufficiently. |
Beta Was this translation helpful? Give feedback.
-
It is a great apples-to-apples comparison. People are finding int4 on CPU to be similar in quality, and a significant reduction in the memory requirements. I wonder if the tokens per Wh will improve when the memory I/O is reduced by approx. a factor of 4? |
Beta Was this translation helpful? Give feedback.
-
Can Apple's implementation for Neural Engine Transformers be of any help to use GPU acceleration? |
Beta Was this translation helpful? Give feedback.
-
Running on ANE or GPU, regardless of speed, is possibly more energy efficient, which is a big enough reason on its own when considering mobile devices. |
Beta Was this translation helpful? Give feedback.
-
I really thank you for the possibility of running the model on my MacBook Air M1. I've been testing various parameters and I'm happy even with the 7B model. However, do you plan to utilize the GPU of M1/M2 chip? Thank you in advance.
Beta Was this translation helpful? Give feedback.
All reactions