-
-
Notifications
You must be signed in to change notification settings - Fork 7.5k
[v1] Move block management logic from KVCacheManager to SpecializedManager #17474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v1] Move block management logic from KVCacheManager to SpecializedManager #17474
Conversation
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
This pull request has merge conflicts that must be resolved before it can be |
…alized_manager Signed-off-by: Chen Zhang <zhangch99@outlook.com>
QQ: Can we use a different term instead of "manager" for the single-type managers? It's a bit confusing since they're lower-level than the KV cache manager, but still called managers. |
What about "SingleTypeKVCacheController"? (I don't want to call it "allocator" as it is much more complex than simple allocation / deallocation.) |
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
…alized_manager Signed-off-by: Chen Zhang <zhangch99@outlook.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@heheda12345 Thanks for the PR. the code is very clean and well organized. Sorry for the delay in my review. Left some minor comments on style. Please check them out.
vllm/v1/core/specialized_manager.py
Outdated
num_new_blocks = min(num_new_blocks, | ||
self.max_num_blocks_per_req - len(req_blocks)) | ||
assert num_new_blocks > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can num_new_blocks
be 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I've fixed it. This check was copied from previous kv_cache_manager. num_new_blocks
can be 0 when eagle is enabled (e.g., run examples/offline_inference/eagle.py
with max_model_len=48
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
9a1bc1d
Can you help to double-check the correctness of this commit? I think it is cleaner to perform num_tokens_need_slot = min(****, self.max_model_len)
before calling allocate_new_blocks
than to perform min
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@heheda12345 Ah yes lgtm! Nice refactoring!
…nager (vllm-project#17474) Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
…nager (vllm-project#17474) Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
…nager (vllm-project#17474) Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Should be merged after #17398
To prepare for hybrid allocator, this PR moves logic that need to run for each specialized manager from KVCacheManager to SpecializedManager. As the
SpecializedManager
not only contains customized logic for different attention type, I renamed it toSingleTypeKVCacheManager
.Prefer to rename
specialized_manager.py
in a seperate PR.Didn't move hashing logic (e.g. req_to_block_hashes) to SpecializedManager as the HybridAllocator will do hashing in
KVCacheManager
level so that different managers with the same block_size can use the same block_hash.Splitted from #16101