Consider re-working the vLLM Gauge exposing the currently active LoRAs

In vLLM, we are finishing an effort (known as "V1") to re-architect the core LLM engine - see vllm-project/vllm#10582

One of the last remaining feature gaps versus V0 is metrics - see vllm-project/vllm#10582

As we carry over the V0 metrics into V1, we are re-evaluating what makes sense and considering deprecating or re-working some metrics - see vllm-project/vllm#12745

The `vllm:lora_requests_info` added by vllm-project/vllm#9477 looks particularly unorthodox - it's value being a timestamp, and the status of adapters encoded as a comma-separated string in label:

```
vllm:lora_requests_info{max_lora="1",running_lora_adapters="",waiting_lora_adapters=""} 1.7395575657589855e+09
vllm:lora_requests_info{max_lora="1",running_lora_adapters="test-lora",waiting_lora_adapters=""} 1.7395575723949368e+09
vllm:lora_requests_info{max_lora="1",running_lora_adapters="test-lora",waiting_lora_adapters="test-lora"} 1.7395575717647147e+09
```

At first blush, it seems like this would achieve the same thing in a more standard representation:

```
vllm:num_lora_requests_running{lora_name="test-lora",model_name="meta-llama/Llama-3.1-8B-Instruct"} 8.0
vllm:num_lora_requests_waiting{lora_name="test-lora",model_name="meta-llama/Llama-3.1-8B-Instruct"} 7.0
```

This sort of thing is separately proposed in vllm-project/vllm#11091. There would also be a `lora_config` Info metric which exposes `max_lora`

Feel free to discuss either here or in vllm-project/vllm#13303. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Consider re-working the vLLM Gauge exposing the currently active LoRAs #354

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Consider re-working the vLLM Gauge exposing the currently active LoRAs #354

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions