Skip to content

Use all available CPUs for batch processing #1345

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 17, 2024
Merged

Use all available CPUs for batch processing #1345

merged 3 commits into from
Apr 17, 2024

Conversation

ddh0
Copy link
Contributor

@ddh0 ddh0 commented Apr 15, 2024

This PR changes the default n_threads_batch parameter of Llama instances to use all available CPU cores. Currently, the default behaviour is to use half of all cores, which is optimal for text generation, but is suboptimal for batch processing. This change should provide some speed improvements, most notably on CPU / OpenBLAS / Metal.

@abetlen
Copy link
Owner

abetlen commented Apr 17, 2024

Hey @ddh0 thank you for this, I've gone ahead and added the change to the server settings as well.

@abetlen abetlen merged commit 0188482 into abetlen:main Apr 17, 2024
16 checks passed
xhedit pushed a commit to xhedit/llama-cpp-conv that referenced this pull request Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants