-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not able see Scaling performance with NuC (12th Gen) with deepseek_r1_distill_llama_8b_q40 #179
Comments
How did you start the workers? |
One each of the worker, i ran "./dllama worker --port 9998 --nthreads 8" |
Hello @deepaks2, please upgrade DL to 0.12.8 and put here logs from |
@b4rtaz Thanks I will share the details |
@b4rtaz Please find the logs 2xNuC ((12th Gen)) with AVX2 support. --> 4xNuC ((12th Gen)) with AVX2 support. --> All 4 NuC are connected via switch. |
It seems that synchronization over Ethernet is very slow. Maybe you should try connecting the two devices directly without a router and compare the results. If I see correctly, the NUC 12th Gen should have 2.5G Ethernet. Thunderbolt 4 can also be used for networking, but it is not easy to configure (I haven't tried it myself). |
Thanks @b4rtaz. I trieed connecting two devices directly without a router and results are slightly better. It improved by 1token/sec ![]() I see only slightly better results from 5.98 token/sec (with router) & 6.27. tokens/sec (direct). I see only 10ms difference in sync. |
I am trying to reproduce the resources on NuC but i see number of token/sec drops when i add more nodes. any help?
System:
4xNuC ((12th Gen)) with AVX2 support.
1xNuC ((12th Gen)) with AVX2 support. -->
./dllama inference --model models/deepseek_r1_distill_llama_8b_q40/dllama_model_deepseek_r1_distill_llama_8b_q40.m --tokenizer models/deepseek_r1_distill_llama_8b_q40/dllama_tokenizer_deepseek_r1_distill_llama_8b_q40.t --buffer-float-type q80 --nthreads 8 --max-seq-len 4096 --prompt "What is 5+9?" --steps 77
Evaluation
nBatches: 32
nTokens: 7
tokens/s: 14.96 (66.86 ms/tok)
Prediction
nTokens: 70
tokens/s: 5.51 (181.43 ms/tok)
2xNuC ((12th Gen)) with AVX2 support. -->
./dllama inference --model models/deepseek_r1_distill_llama_8b_q40/dllama_model_deepseek_r1_distill_llama_8b_q40.m --tokenizer models/deepseek_r1_distill_llama_8b_q40/dllama_tokenizer_deepseek_r1_distill_llama_8b_q40.t --buffer-float-type q80 --nthreads 8 --max-seq-len 4096 --prompt "What is 5+9?" --steps 77 --workers 10.10.10.2:9998
Evaluation
nBatches: 32
nTokens: 7
tokens/s: 9.25 (108.14 ms/tok)
Prediction
nTokens: 70
tokens/s: 5.96 (167.91 ms/tok)
4xNuC ((12th Gen)) with AVX2 support. -->
./dllama inference --model models/deepseek_r1_distill_llama_8b_q40/dllama_model_deepseek_r1_distill_llama_8b_q40.m --tokenizer models/deepseek_r1_distill_llama_8b_q40/dllama_tokenizer_deepseek_r1_distill_llama_8b_q40.t --buffer-float-type q80 --nthreads 8 --max-seq-len 4096 --prompt "What is 5+9?" --steps 77 --workers 10.10.10.2:9998 10.10.10.4:9998 10.10.10.5:9998
Evaluation
nBatches: 32
nTokens: 7
tokens/s: 6.74 (148.29 ms/tok)
Prediction
nTokens: 70
tokens/s: 5.02 (199.27 ms/tok)
Any help here. is this expected?
The text was updated successfully, but these errors were encountered: