MAX_BATCH_TOKENS parameter #367

maioranisimone · 2024-08-05T15:20:19Z

maioranisimone
Aug 5, 2024

hello everyone, I would like to ask where I can find the maximum value to assign to max_batch_tokens parameter. I read that text-embeddings-inference cannot find this value automatically, where is this value declared in the models present on Hugging Face?
Thanks in advance.

gerritvd · 2025-02-16T01:31:04Z

gerritvd
Feb 16, 2025

I would also like to know more about this parameter. I just tested with all-minilm-l6-v2 on an L4 GPU and actually setting this parameter to 2048 increased throughput significantly compared to the default setting of 16K. Why does throughput decrease beyond and below this value for this model?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MAX_BATCH_TOKENS parameter #367

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

MAX_BATCH_TOKENS parameter #367

Uh oh!

maioranisimone Aug 5, 2024

Replies: 1 comment

Uh oh!

gerritvd Feb 16, 2025

maioranisimone
Aug 5, 2024

gerritvd
Feb 16, 2025