You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[V0][Metrics] Deprecate some KV/prefix cache metrics
vllm:num_requests_swapped, vllm:cpu_cache_usage_perc and
vllm:cpu_prefix_cache_hit_rate will no longer be relevant in
V1 since we no longer implement KV cache offloading. So
these metrics should be considered deprecated.
And as agreed in vllm-project#12592, we have added prefix_cache_queries and
prefix_cache_hits counters to replace the prefix_cache_hit_rate
gauge as it allows the interval over which the hit rate is
calculated to be controlled in a Prometheus query like:
```
rate(prefix_cache_queries[5m]) / rate(prefix_cache_hits[5m])
```
In theory, we could ease the transition be implementing the
old hit rate metric in V1 and the new queries/hits metrics
in V0, but it's probably not worthwhile unless we learn the
hit rate metric is heavily used by V0 users.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
0 commit comments