Not able see Scaling performance with NuC (12th Gen) with deepseek_r1_distill_llama_8b_q40 #179

deepaks2 · 2025-03-03T14:13:46Z

I am trying to reproduce the resources on NuC but i see number of token/sec drops when i add more nodes. any help?

System:
4xNuC ((12th Gen)) with AVX2 support.

1xNuC ((12th Gen)) with AVX2 support. -->
./dllama inference --model models/deepseek_r1_distill_llama_8b_q40/dllama_model_deepseek_r1_distill_llama_8b_q40.m --tokenizer models/deepseek_r1_distill_llama_8b_q40/dllama_tokenizer_deepseek_r1_distill_llama_8b_q40.t --buffer-float-type q80 --nthreads 8 --max-seq-len 4096 --prompt "What is 5+9?" --steps 77
Evaluation
nBatches: 32
nTokens: 7
tokens/s: 14.96 (66.86 ms/tok)
Prediction
nTokens: 70
tokens/s: 5.51 (181.43 ms/tok)

2xNuC ((12th Gen)) with AVX2 support. -->
./dllama inference --model models/deepseek_r1_distill_llama_8b_q40/dllama_model_deepseek_r1_distill_llama_8b_q40.m --tokenizer models/deepseek_r1_distill_llama_8b_q40/dllama_tokenizer_deepseek_r1_distill_llama_8b_q40.t --buffer-float-type q80 --nthreads 8 --max-seq-len 4096 --prompt "What is 5+9?" --steps 77 --workers 10.10.10.2:9998

Evaluation
nBatches: 32
nTokens: 7
tokens/s: 9.25 (108.14 ms/tok)
Prediction
nTokens: 70
tokens/s: 5.96 (167.91 ms/tok)

4xNuC ((12th Gen)) with AVX2 support. -->
./dllama inference --model models/deepseek_r1_distill_llama_8b_q40/dllama_model_deepseek_r1_distill_llama_8b_q40.m --tokenizer models/deepseek_r1_distill_llama_8b_q40/dllama_tokenizer_deepseek_r1_distill_llama_8b_q40.t --buffer-float-type q80 --nthreads 8 --max-seq-len 4096 --prompt "What is 5+9?" --steps 77 --workers 10.10.10.2:9998 10.10.10.4:9998 10.10.10.5:9998

Evaluation
nBatches: 32
nTokens: 7
tokens/s: 6.74 (148.29 ms/tok)
Prediction
nTokens: 70
tokens/s: 5.02 (199.27 ms/tok)

Any help here. is this expected?

D-i-t-gh · 2025-03-03T15:27:05Z

How did you start the workers?

deepaks2 · 2025-03-04T08:15:13Z

One each of the worker, i ran "./dllama worker --port 9998 --nthreads 8"
on the Root node, "./dllama inference --model models/deepseek_r1_distill_llama_8b_q40/dllama_model_deepseek_r1_distill_llama_8b_q40.m --tokenizer models/deepseek_r1_distill_llama_8b_q40/dllama_tokenizer_deepseek_r1_distill_llama_8b_q40.t --buffer-float-type q80 --nthreads 8 --max-seq-len 4096 --prompt "What is 5+9?" --steps 77 --workers 10.10.10.2:9998 10.10.10.4:9998 10.10.10.5:9998"

b4rtaz · 2025-03-04T10:18:25Z

Hello @deepaks2,

please upgrade DL to 0.12.8 and put here logs from inference mode. This version shows the time needed for inference and synchronization.

deepaks2 · 2025-03-05T05:19:10Z

@b4rtaz Thanks I will share the details

deepaks2 · 2025-03-05T13:26:47Z

@b4rtaz Please find the logs

2xNuC ((12th Gen)) with AVX2 support. -->

4xNuC ((12th Gen)) with AVX2 support. -->

All 4 NuC are connected via switch.

b4rtaz · 2025-03-05T17:42:43Z

It seems that synchronization over Ethernet is very slow. Maybe you should try connecting the two devices directly without a router and compare the results. If I see correctly, the NUC 12th Gen should have 2.5G Ethernet. Thunderbolt 4 can also be used for networking, but it is not easy to configure (I haven't tried it myself).

deepaks2 · 2025-03-06T08:19:42Z

Thanks @b4rtaz. I trieed connecting two devices directly without a router and results are slightly better. It improved by 1token/sec

I see only slightly better results from 5.98 token/sec (with router) & 6.27. tokens/sec (direct). I see only 10ms difference in sync.

deepaks2 changed the title ~~Not able see Scaling performance with Intel NuC (12th Gen) with deepseek_r1_distill_llama_8b_q40~~ Not able see Scaling performance with NuC (12th Gen) with deepseek_r1_distill_llama_8b_q40 Mar 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not able see Scaling performance with NuC (12th Gen) with deepseek_r1_distill_llama_8b_q40 #179

Not able see Scaling performance with NuC (12th Gen) with deepseek_r1_distill_llama_8b_q40 #179

deepaks2 commented Mar 3, 2025 •

edited

Loading

D-i-t-gh commented Mar 3, 2025

deepaks2 commented Mar 4, 2025 •

edited

Loading

b4rtaz commented Mar 4, 2025

deepaks2 commented Mar 5, 2025

deepaks2 commented Mar 5, 2025 •

edited

Loading

b4rtaz commented Mar 5, 2025 •

edited

Loading

deepaks2 commented Mar 6, 2025 •

edited

Loading

Not able see Scaling performance with NuC (12th Gen) with deepseek_r1_distill_llama_8b_q40 #179

Not able see Scaling performance with NuC (12th Gen) with deepseek_r1_distill_llama_8b_q40 #179

Comments

deepaks2 commented Mar 3, 2025 • edited Loading

D-i-t-gh commented Mar 3, 2025

deepaks2 commented Mar 4, 2025 • edited Loading

b4rtaz commented Mar 4, 2025

deepaks2 commented Mar 5, 2025

deepaks2 commented Mar 5, 2025 • edited Loading

b4rtaz commented Mar 5, 2025 • edited Loading

deepaks2 commented Mar 6, 2025 • edited Loading

deepaks2 commented Mar 3, 2025 •

edited

Loading

deepaks2 commented Mar 4, 2025 •

edited

Loading

deepaks2 commented Mar 5, 2025 •

edited

Loading

b4rtaz commented Mar 5, 2025 •

edited

Loading

deepaks2 commented Mar 6, 2025 •

edited

Loading