What is the meaning of total_time exposed as a performance metrics #476
Unanswered
rahul-sentieo
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
We are using the rerank functionality for reranking query, documents pair. I was checking the logs and seems like the queue time + tokenisation time + inference time is not adding up to the total time exposed by the metrics. Infact the total high is very high. Here is an example:

inference time ~ 286ms
tokenization time ~ 139ms
queue time ~ 338ms
Total time should be 286ms + 139ms + 338ms =763ms
but according to the log, total time ~ 3.03 sec
Beta Was this translation helpful? Give feedback.
All reactions