Skip to content

[V1][Metrics] Implement vllm:lora_requests_info metric #13504

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Feb 25, 2025

Conversation

markmc
Copy link
Member

@markmc markmc commented Feb 18, 2025

vllm:lora_requests_info{max_lora="1",running_lora_adapters="",waiting_lora_adapters=""} 1.7395575657589855e+09
vllm:lora_requests_info{max_lora="1",running_lora_adapters="test-lora",waiting_lora_adapters=""} 1.7395575723949368e+09
vllm:lora_requests_info{max_lora="1",running_lora_adapters="test-lora",waiting_lora_adapters="test-lora"} 1.7395575717647147e+09

As discussed in #13303, this metric perhaps isn't the most ideal solution for the use case but, given there is an existing user, we should retain compatibility in V1 and deprecate it if we replace it with a different metric.

See also kubernetes-sigs/gateway-api-inference-extension#354

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@@ -241,11 +255,22 @@ def _update_stats_from_finished(self, req_state: RequestState,
if iteration_stats is None:
return

lora_stats: Optional[LoRAStats] = None
if req_state.lora_name:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can just be lora_stats = self.lora_stats.get(req_state.lora_name, None)

@@ -151,3 +168,11 @@ def update_from_finished_request(self, finish_reason: "FinishReason",
inference_time=inference_time,
decode_time=decode_time)
self.finished_requests.append(finished_req)

if lora_stats is not None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One state transition which may not be accounted for here is if a request is aborted

Aborts occur when the client disconnects from the server. The AsyncLLM sends an abort message to the OutputProcessor and to the EngineCore. This may cause the request to be finished, but I am not 100% sure that it does. This could cause the request to never be removed from running_requests. We should just double check this.

Additionally, if a client aborts when the request is in lora_stats.waiting_requests it will not ever be removed from the waiting set since the request will never be scheduled.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point, take a look now - basically I've tried to couple this closely with RequestState lifecycle

Copy link
Collaborator

@robertgshaw2-redhat robertgshaw2-redhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good. Left a small nit that will help clean up the API a bit EngineCoreRequest already contains lora_request so we do not need to pass it around as a separate arg

The one functionality issue I see is that we do not handle request aborting properly. Can connect offline about this but I left a detailed comment below.

```
vllm:lora_requests_info{max_lora="1",running_lora_adapters="",waiting_lora_adapters=""} 1.7395575657589855e+09
vllm:lora_requests_info{max_lora="1",running_lora_adapters="test-lora",waiting_lora_adapters=""} 1.7395575723949368e+09
vllm:lora_requests_info{max_lora="1",running_lora_adapters="test-lora",waiting_lora_adapters="test-lora"} 1.7395575717647147e+09
```

As discussed in vllm-project#13303, this metric perhaps isn't the most ideal
solution for the use case but, given there is an existing user,
we should retain compatibility in V1 and deprecate it if we
replace it with a different metric.

See also kubernetes-sigs/gateway-api-inference-extension#354

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
The LoRA name is already available via the EngineCoreRequest arg.

PR feedback from Rob.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Both are global state. PR feedback from Rob.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Tie LoRAStats lifecycle more closely to RequestStats.

1. Add request to waiting_requests when we create a
   RequestStats in add_request()
2. Remove request from LoRAStats whenever we remove
   a RequestStats (both finish and abort)

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Encapsulate all of this into a LoRARequestStates class in the
v1.metrics.stats module, since its sole purpose is to gather
stats.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
@markmc markmc force-pushed the metrics-v1-lora-compat branch from 15b9326 to b78a6f1 Compare February 24, 2025 15:02
@markmc
Copy link
Member Author

markmc commented Feb 24, 2025

Generally looks good. Left a small nit that will help clean up the API a bit EngineCoreRequest already contains lora_request so we do not need to pass it around as a separate arg

The one functionality issue I see is that we do not handle request aborting properly. Can connect offline about this but I left a detailed comment below.

Thanks @robertgshaw2-redhat. Great feedback. This definitely warranted a bunch of cleanup. See the 5 fixup commits I pushed 👍

@robertgshaw2-redhat
Copy link
Collaborator

robertgshaw2-redhat commented Feb 24, 2025

LGTM, I like the abstraction and lifecycle. Just one nit on the typing. Ping when ready for automerge.

Do you know if we have any test coverage for this metric?

PR feedback from Rob.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
@markmc
Copy link
Member Author

markmc commented Feb 24, 2025

LGTM, I like the abstraction and lifecycle. Just one nit on the typing. Ping when ready for automerge.

Thanks! Fixed

Do you know if we have any test coverage for this metric?

No we don't, and I did consider addressing that (instead of manually) testing, but I was loathe to invest more time in it because I'm only adding it for compatibility reasons ... I'm pretty dubious that it's the right long-term solution (so I'd rather e.g. spend that time capturing all of the input on the topic in the design doc)

@robertgshaw2-redhat robertgshaw2-redhat enabled auto-merge (squash) February 24, 2025 16:23
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 24, 2025
@simon-mo simon-mo merged commit bc32bc7 into vllm-project:main Feb 25, 2025
55 of 56 checks passed
@@ -246,6 +260,7 @@ def _update_stats_from_finished(self, req_state: RequestState,
iteration_stats.update_from_finished_request(finish_reason,
request_output,
req_state.stats)
self.lora_states.finish_request(req_state)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If iteration_stats is None, the execution of self.lora_states.finish_request(req_state) will be skipped. Could this lead to a lora_state leak?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. This would cause a leak in the case that:

  • stat logging is disabled
  • lora is enabled

Thanks for your detailed review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed v1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants