Use all available CPUs for batch processing #1345

ddh0 · 2024-04-15T01:01:33Z

This PR changes the default n_threads_batch parameter of Llama instances to use all available CPU cores. Currently, the default behaviour is to use half of all cores, which is optimal for text generation, but is suboptimal for batch processing. This change should provide some speed improvements, most notably on CPU / OpenBLAS / Metal.

abetlen · 2024-04-17T14:04:16Z

Hey @ddh0 thank you for this, I've gone ahead and added the change to the server settings as well.

ddh0 and others added 3 commits April 14, 2024 19:54

Use all cores for n_threads_batch

4a900cb

Actually use all CPUs

eba058a

Update setting for server as well

1ab8941

abetlen merged commit 0188482 into abetlen:main Apr 17, 2024
16 checks passed

abetlen pushed a commit that referenced this pull request Apr 17, 2024

feat: Use all available CPUs for batch processing (#1345)

c96b2da

xhedit pushed a commit to xhedit/llama-cpp-conv that referenced this pull request Apr 30, 2024

feat: Use all available CPUs for batch processing (abetlen#1345)

a36056c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use all available CPUs for batch processing #1345

Use all available CPUs for batch processing #1345

Uh oh!

ddh0 commented Apr 15, 2024

Uh oh!

abetlen commented Apr 17, 2024

Uh oh!

Uh oh!

Uh oh!

Use all available CPUs for batch processing #1345

Use all available CPUs for batch processing #1345

Uh oh!

Conversation

ddh0 commented Apr 15, 2024

Uh oh!

abetlen commented Apr 17, 2024

Uh oh!

Uh oh!

Uh oh!