Skip to content

[v1] Move block management logic from KVCacheManager to SpecializedManager #17474

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
May 9, 2025

Conversation

heheda12345
Copy link
Collaborator

@heheda12345 heheda12345 commented Apr 30, 2025

Should be merged after #17398

To prepare for hybrid allocator, this PR moves logic that need to run for each specialized manager from KVCacheManager to SpecializedManager. As the SpecializedManager not only contains customized logic for different attention type, I renamed it to SingleTypeKVCacheManager.

Prefer to rename specialized_manager.py in a seperate PR.

Didn't move hashing logic (e.g. req_to_block_hashes) to SpecializedManager as the HybridAllocator will do hashing in KVCacheManager level so that different managers with the same block_size can use the same block_hash.

Splitted from #16101

Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the v1 label Apr 30, 2025
Copy link

mergify bot commented Apr 30, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @heheda12345.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Apr 30, 2025
…alized_manager

Signed-off-by: Chen Zhang <zhangch99@outlook.com>
@mergify mergify bot removed the needs-rebase label May 1, 2025
@WoosukKwon
Copy link
Collaborator

QQ: Can we use a different term instead of "manager" for the single-type managers? It's a bit confusing since they're lower-level than the KV cache manager, but still called managers.

@heheda12345
Copy link
Collaborator Author

What about "SingleTypeKVCacheController"? (I don't want to call it "allocator" as it is much more complex than simple allocation / deallocation.)

@WoosukKwon
Copy link
Collaborator

SingleTypeKVCacheController
Doesn't sound like a better name 😅 ok let's keep "manager" and brainstorm whether there's a better option that "SingleTypeManager"

@WoosukKwon WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label May 6, 2025
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Copy link

mergify bot commented May 6, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @heheda12345.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label May 6, 2025
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
…alized_manager

Signed-off-by: Chen Zhang <zhangch99@outlook.com>
@mergify mergify bot removed the needs-rebase label May 6, 2025
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Copy link
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@heheda12345 Thanks for the PR. the code is very clean and well organized. Sorry for the delay in my review. Left some minor comments on style. Please check them out.

Comment on lines 139 to 141
num_new_blocks = min(num_new_blocks,
self.max_num_blocks_per_req - len(req_blocks))
assert num_new_blocks > 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can num_new_blocks be 0?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I've fixed it. This check was copied from previous kv_cache_manager. num_new_blocks can be 0 when eagle is enabled (e.g., run examples/offline_inference/eagle.py with max_model_len=48)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9a1bc1d
Can you help to double-check the correctness of this commit? I think it is cleaner to perform num_tokens_need_slot = min(****, self.max_model_len) before calling allocate_new_blocks than to perform min here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@heheda12345 Ah yes lgtm! Nice refactoring!

Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
@WoosukKwon WoosukKwon enabled auto-merge (squash) May 9, 2025 15:24
@WoosukKwon WoosukKwon merged commit 200da9a into vllm-project:main May 9, 2025
50 checks passed
princepride pushed a commit to princepride/vllm that referenced this pull request May 10, 2025
…nager (vllm-project#17474)

Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025
…nager (vllm-project#17474)

Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
mawong-amd pushed a commit to ROCm/vllm that referenced this pull request May 14, 2025
…nager (vllm-project#17474)

Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed v1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants