MAX_BATCH_TOKENS parameter #367
Unanswered
maioranisimone
asked this question in
Q&A
Replies: 1 comment
-
I would also like to know more about this parameter. I just tested with all-minilm-l6-v2 on an L4 GPU and actually setting this parameter to 2048 increased throughput significantly compared to the default setting of 16K. Why does throughput decrease beyond and below this value for this model? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
hello everyone, I would like to ask where I can find the maximum value to assign to max_batch_tokens parameter. I read that text-embeddings-inference cannot find this value automatically, where is this value declared in the models present on Hugging Face?
Thanks in advance.
Beta Was this translation helpful? Give feedback.
All reactions