You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
./dllama inference --steps 64 --prompt "Hello world" --model models/llama3_3_70b_instruct_q40/dllama_model_llama3_3_70b_instruct_q40.m --tokenizer models/llama3_3_70b_instruct_q40/dllama_tokenizer_llama3_3_70b_instruct_q40.t --buffer-float-type q80 --nthreads 10 --max-seq-len 8192 --workers 192.168.0.136:9999 192.168.0.135:9999 192.168.0.134:9999
...
🔶 G 688 ms I 665 ms T 21 ms S 13044 kB R 13227 kB including
🔶 G 696 ms I 674 ms T 21 ms S 13044 kB R 13227 kB Django
🔶 G 687 ms I 663 ms T 23 ms S 13044 kB R 13227 kB ,
🔶 G 692 ms I 662 ms T 29 ms S 13044 kB R 13227 kB Flask
🔶 G 688 ms I 665 ms T 23 ms S 13044 kB R 13227 kB ,
🔶 G 689 ms I 666 ms T 22 ms S 13044 kB R 13227 kB and
🔶 G 691 ms I 663 ms T 27 ms S 13044 kB R 13227 kB pandas
🔶 G 689 ms I 662 ms T 26 ms S 13044 kB R 13227 kB .ĊĊ
🔶 G 700 ms I 673 ms T 26 ms S 13044 kB R 13227 kB I
🔶 G 691 ms I 665 ms T 24 ms S 13044 kB R 13227 kB 'm
🔶 G 691 ms I 665 ms T 26 ms S 13044 kB R 13227 kB excited
🔶 G 696 ms I 663 ms T 32 ms S 13044 kB R 13227 kB to
🔶 G 693 ms I 666 ms T 26 ms S 13044 kB R 13227 kB be
🔶 G 692 ms I 665 ms T 26 ms S 13044 kB R 13227 kB here
🔶 G 697 ms I 667 ms T 27 ms S 13044 kB R 13227 kB and
🔶 G 694 ms I 666 ms T 28 ms S 13044 kB R 13227 kB contribute
🔶 G 701 ms I 670 ms T 29 ms S 13044 kB R 13227 kB to
🔶 G 694 ms I 665 ms T 28 ms S 13044 kB R 13227 kB the
🔶 G 696 ms I 660 ms T 35 ms S 13044 kB R 13227 kB community
Generated tokens: 64
Avg tokens / second: 1.42
Avg generation time: 701.98 ms
Avg inference time: 665.00 ms
Avg transfer time: 35.88 ms
llama3_3_70b_instruct_q40
Network
2 x Mac Mini M4 Pro 64GB RAM
4 x Mac Mini M4 Pro 64GB RAM
Thunderbolt 5
4.72 tok/s
7.51 tok/s
2x Mac Mini M4 Pro 64 GB RAM via Thunderbolt 5
./dllama inference --steps 64 --prompt "Hello world" --model models/llama3_3_70b_instruct_q40/dllama_model_llama3_3_70b_instruct_q40.m --tokenizer models/llama3_3_70b_instruct_q40/dllama_tokenizer_llama3_3_70b_instruct_q40.t --buffer-float-type q80 --nthreads 10 --max-seq-len 8192 --workers 192.168.0.136:9999
...
🔶 G 217 ms I 205 ms T 12 ms S 1392 kB R 1610 kB takes
🔶 G 218 ms I 211 ms T 6 ms S 1392 kB R 1610 kB no
🔶 G 218 ms I 207 ms T 11 ms S 1392 kB R 1610 kB arguments
🔶 G 218 ms I 204 ms T 14 ms S 1392 kB R 1610 kB and
🔶 G 218 ms I 208 ms T 10 ms S 1392 kB R 1610 kB prints
🔶 G 219 ms I 209 ms T 10 ms S 1392 kB R 1610 kB the
🔶 G 219 ms I 205 ms T 13 ms S 1392 kB R 1610 kB string
🔶 G 219 ms I 201 ms T 17 ms S 1392 kB R 1610 kB "
Generated tokens: 64
Avg tokens / second: 4.72
Avg generation time: 211.77 ms
Avg inference time: 195.05 ms
Avg transfer time: 16.41 ms
4x Mac Mini M4 Pro 64 GB RAM via Thunderbolt 5
./dllama inference --steps 64 --prompt "Hello world" --model models/llama3_3_70b_instruct_q40/dllama_model_llama3_3_70b_instruct_q40.m --tokenizer models/llama3_3_70b_instruct_q40/dllama_tokenizer_llama3_3_70b_instruct_q40.t --buffer-float-type q80 --nthreads 10 --max-seq-len 8192 --workers 192.168.0.136:9999 192.168.0.135:9999 192.168.0.134:9999
...
🔶 G 133 ms I 95 ms T 37 ms S 4176 kB R 4455 kB Non
🔶 G 133 ms I 94 ms T 38 ms S 4176 kB R 4455 kB -blocking
🔶 G 133 ms I 97 ms T 36 ms S 4176 kB R 4455 kB service
🔶 G 133 ms I 92 ms T 40 ms S 4176 kB R 4455 kB method
🔶 G 133 ms I 94 ms T 39 ms S 4176 kB R 4455 kB calls
🔶 G 132 ms I 98 ms T 33 ms S 4176 kB R 4455 kB are
🔶 G 134 ms I 96 ms T 38 ms S 4176 kB R 4455 kB also
🔶 G 133 ms I 97 ms T 36 ms S 4176 kB R 4455 kB supported
🔶 G 131 ms I 93 ms T 38 ms S 4176 kB R 4455 kB ,
Generated tokens: 64
Avg tokens / second: 7.51
Avg generation time: 133.17 ms
Avg inference time: 95.33 ms
Avg transfer time: 37.47 ms
This performance test was made possible thanks to MacWeb.com ❤️, which offers on-demand access to Macs in the cloud.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Distributed Llama version: 0.11.1 (CPU only)
llama3_1_405b_instruct_q40
4x Mac Mini M4 Pro 64 GB RAM via Thunderbolt 5
llama3_3_70b_instruct_q40
2x Mac Mini M4 Pro 64 GB RAM via Thunderbolt 5
4x Mac Mini M4 Pro 64 GB RAM via Thunderbolt 5
This performance test was made possible thanks to MacWeb.com ❤️, which offers on-demand access to Macs in the cloud.
Beta Was this translation helpful? Give feedback.
All reactions