-
-
Notifications
You must be signed in to change notification settings - Fork 7.5k
Refactor pplx init logic to make it modular (prepare for deepep) #18200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
youkaichao
wants to merge
32
commits into
vllm-project:main
Choose a base branch
from
youkaichao:refactor_pplx
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 14 commits
Commits
Show all changes
32 commits
Select commit
Hold shift + click to select a range
cbe41ef
revert enable_expert_parallel everywhere
youkaichao 5c6ef5e
tmp
youkaichao 29aebf6
tmp
youkaichao 058d8e5
fix typing
youkaichao b5dce6f
fix typing
youkaichao ed9299a
document options
youkaichao 88cb9d5
fix inductor
youkaichao 729239f
fix shutdown error
youkaichao 1bf90d6
fix shutdown error
youkaichao 4dc2455
comment
youkaichao b680ce9
fix max_num_tokens
youkaichao f5c6b57
add comments
youkaichao ad70c44
fix max_num_tokens
youkaichao b20e977
allow per-layer all2all
youkaichao 1e60d54
merge into init_prepare_finalize
youkaichao 15d673b
disable inductor
youkaichao 63f029b
fix
youkaichao d75ac1c
rename to manager
youkaichao 60499df
fix for non-pplx
youkaichao cd6858e
fix for non-pplx
youkaichao 3f6a862
move prepare_communication_buffer_for_model to base
youkaichao d419736
fix no ep case
youkaichao 5e9d2c9
annotate moe and quant_config
youkaichao 522ea26
fix typing
youkaichao f3fc838
fix init
youkaichao c3cd65c
fix cross-node init
youkaichao 259a724
fix typing
youkaichao 52945ba
fix typing
youkaichao 9c73776
fix typing
youkaichao 5b4095b
meaningful comments
youkaichao 0944f27
Merge branch 'main' into refactor_pplx
youkaichao cfa027b
fix error
youkaichao File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be cleaner to call
_construct_prepare_finalize
and stash the result inmoe_layer.quant_method.fused_experts.prepare_finalize
rather than poking the a2a into the already constructed object.Maybe make the whole process into a method, e.g. move the
_construct_prepare_finalize
call +set_prepare_finalize
into a singleinit_prepare_finalize
method.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also am generally opposed to shoving things into objects like this, but I do wonder sometimes if I'm colored by my C++ background and should embrace Python's "flexibility" more
@bnellnm what would that look like? Maybe we try it in a followup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merged them into
init_prepare_finalize
now.