Parallel sampling in processing the batch of tokens? #11882
Unanswered
whitezhang
asked this question in
Q&A
Replies: 1 comment 2 replies
-
The requests are already processed in parallel - there is nothing extra necessary to enable this. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
When I use the following command to start the server:
I found that the time cost for each query is quite high. I checked the code and found that it processes each slot serially. Is it possible to make this parallel? I can do this if it is feasible. Or is there anything else I haven’t considered?
Here’s a simplified conceptual code snippet that I want to change to be
Beta Was this translation helpful? Give feedback.
All reactions