Skip to content

[BugFix] Fix ascend config check #1092

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 6, 2025

Conversation

wangxiyuan
Copy link
Collaborator

@wangxiyuan wangxiyuan commented Jun 5, 2025

Fix the ascend config check logic:

  1. refactor check_ascend_config to make it clear:
    1. torchair graph should not work with enforce_eager=True
    2. aclgraph should not work with torchair graph
  2. add refresh config for rlhf case
  3. fix a typo in model runner
  4. change expert_tensor_parallel_size default to 0 to keep the same as before

@wangxiyuan wangxiyuan force-pushed the fix_ascend_config branch from 372e466 to b9a4e95 Compare June 5, 2025 15:35
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Jun 5, 2025
Copy link
Collaborator

@Yikun Yikun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if CI passed

@wangxiyuan wangxiyuan force-pushed the fix_ascend_config branch 3 times, most recently from 324099c to d1439b7 Compare June 5, 2025 15:51
@@ -126,7 +129,7 @@ def check_ascend_config(vllm_config, enforce_eager):
"Torchair graph mode only works with deepseek model.")

# for V1 Engine, aclgraph doesn't work with deepseek model and only qwen model is well tested.
if envs.VLLM_USE_V1 and vllm_config.model_config is not None and not enforce_eager:
if envs.VLLM_USE_V1 and vllm_config.model_config is not None and not enforce_eager and not ascend_config.torchair_graph_config.enabled:
Copy link
Contributor

@NeverRaR NeverRaR Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this check should be moved to platform.py:check_and_update_config

@@ -126,7 +129,7 @@ def check_ascend_config(vllm_config, enforce_eager):
"Torchair graph mode only works with deepseek model.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be:

if ascend_config.torchair_graph_config.enabled:
        if envs.VLLM_MLA_DISABLE:
            logger.warning(
                "Torchair graph mode is still experimental and not supported for V1 without mla currently, "
                "it has been disabled automatically.")
            ascend_config.torchair_graph_config.enabled = False
        elif vllm_config.model_config:
            model_type = vllm_config.model_config.hf_config.model_type
            if "deepseek" not in model_type:
                raise NotImplementedError(
                    "Torchair graph mode only works with deepseek model.")

ascend_config. ascend_scheduler_config.enabled = False -> ascend_config.torchair_graph_config.enabled = False
if vllm_config.model_config: -> elif vllm_config.model_config:

@wangxiyuan wangxiyuan force-pushed the fix_ascend_config branch from d1439b7 to 3eefdc9 Compare June 6, 2025 01:04
@wangxiyuan
Copy link
Collaborator Author

@NeverRaR Thanks for your review. I've updated the check logic to make it more clear. Please take a look at again.

@@ -29,6 +29,7 @@ The following table lists the additional configuration options available in vLLM
| `torchair_graph_config` | dict | `{}` | The config options for torchair graph mode |
| `ascend_scheduler_config` | dict | `{}` | The config options for ascend scheduler |
| `expert_tensor_parallel_size` | str | `1` | Expert tensor parallel size the model to use. |
| `refresh` | bool | `false` | Whether to refresh global ascend config content. This value is usually used by rlhf case. |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ: Does vLLM has similar config?

Copy link
Collaborator Author

@wangxiyuan wangxiyuan Jun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not, this value is only used for rlhf, for verl or someother framework, the case is:

  1. verl load and update vllm config
  2. verl start LLM with additional config in external_executor mode

in the fisrt step, the ascend config has been initialized, then in the second step, the additional config will be skipped.

To solve the problem, we should let verl pass refresh, the we can regenerate the config

@wangxiyuan wangxiyuan force-pushed the fix_ascend_config branch 5 times, most recently from 830e333 to 8cf5adf Compare June 6, 2025 08:47
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
@wangxiyuan wangxiyuan force-pushed the fix_ascend_config branch from 8cf5adf to 755edcf Compare June 6, 2025 09:08
@wangxiyuan wangxiyuan merged commit dab19d5 into vllm-project:main Jun 6, 2025
23 checks passed
@wangxiyuan wangxiyuan deleted the fix_ascend_config branch June 9, 2025 01:25
venus-taibai pushed a commit to venus-taibai/vllm-ascend that referenced this pull request Jun 18, 2025
…llm-project#1092)

Merge branch wengang/cherry-pick-1029-1092 of git@code.alipay.com:Theta/vllm-ascend.git into dev-v0.9.0604
https://code.alipay.com/Theta/vllm-ascend/pull_requests/108

Reviewed-by: 子宏 <tanzhiqiang.tzq@antgroup.com>


* [Misc] Refactor additional_config (vllm-project#1029)
* [BugFix] Fix ascend config check (vllm-project#1092)
* [Misc] Update benchmark scripts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation module:core module:tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants