Skip to content

Consider re-working the vLLM Gauge exposing the currently active LoRAs #354

Closed
@markmc

Description

@markmc

In vLLM, we are finishing an effort (known as "V1") to re-architect the core LLM engine - see vllm-project/vllm#10582

One of the last remaining feature gaps versus V0 is metrics - see vllm-project/vllm#10582

As we carry over the V0 metrics into V1, we are re-evaluating what makes sense and considering deprecating or re-working some metrics - see vllm-project/vllm#12745

The vllm:lora_requests_info added by vllm-project/vllm#9477 looks particularly unorthodox - it's value being a timestamp, and the status of adapters encoded as a comma-separated string in label:

vllm:lora_requests_info{max_lora="1",running_lora_adapters="",waiting_lora_adapters=""} 1.7395575657589855e+09
vllm:lora_requests_info{max_lora="1",running_lora_adapters="test-lora",waiting_lora_adapters=""} 1.7395575723949368e+09
vllm:lora_requests_info{max_lora="1",running_lora_adapters="test-lora",waiting_lora_adapters="test-lora"} 1.7395575717647147e+09

At first blush, it seems like this would achieve the same thing in a more standard representation:

vllm:num_lora_requests_running{lora_name="test-lora",model_name="meta-llama/Llama-3.1-8B-Instruct"} 8.0
vllm:num_lora_requests_waiting{lora_name="test-lora",model_name="meta-llama/Llama-3.1-8B-Instruct"} 7.0

This sort of thing is separately proposed in vllm-project/vllm#11091. There would also be a lora_config Info metric which exposes max_lora

Feel free to discuss either here or in vllm-project/vllm#13303. Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions