You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current `vllm:lora_requests_info` Gauge is somewhat similar to an
Info metric (like cache_config_info) except the value is the current
wall-clock time, and is updated every iteration.
The label names used are:
- running_lora_adapters: a list of adapters with running requests,
formatted as a comma-separated string.
- waiting_lora_adapters: similar, except listing adapters with
requests waiting to be scheduled.
- max_lora - the static "max number of LoRAs in a single batch."
configuration.
It looks like this:
```
vllm:lora_requests_info{max_lora="1",running_lora_adapters="",waiting_lora_adapters=""} 1.7395575657589855e+09
vllm:lora_requests_info{max_lora="1",running_lora_adapters="test-lora",waiting_lora_adapters=""} 1.7395575723949368e+09
vllm:lora_requests_info{max_lora="1",running_lora_adapters="test-lora",waiting_lora_adapters="test-lora"} 1.7395575717647147e+09
```
I can't really make much sense of this. Encoding a running/waiting
status for multiple adapters in a comma-separated string seems quite
misguided - we should use labels to distinguish between per-adapter
counts instead:
```
vllm:num_lora_requests_running{lora_name="test-lora",model_name="meta-llama/Llama-3.1-8B-Instruct"} 8.0
vllm:num_lora_requests_waiting{lora_name="test-lora",model_name="meta-llama/Llama-3.1-8B-Instruct"} 7.0
```
This was added in #9477 and there is at least one known user. If we
revisit this design and deprecate the old metric, we should reduce the
need for a significant deprecation period by making the change in v0
also and asking this project to move to the new metric.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
0 commit comments