Skip to content

Fix Whisper crash caused by invalid max_num_batched_tokens config #17853

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 9, 2025

Conversation

inkcherry
Copy link
Contributor

@inkcherry inkcherry commented May 8, 2025

fix #17797

Copy link

github-actions bot commented May 8, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@DarkLight1337
Copy link
Member

This looks reasonable to me, but pinging @NickLucche @ywang96 just to be sure.

inkcherry added 2 commits May 8, 2025 11:04
Signed-off-by: inkcherry <mingzhi.liu@intel.com>
Signed-off-by: inkcherry <mingzhi.liu@intel.com>
Copy link
Contributor

@NickLucche NickLucche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into! Left one comment.
I am not sure this is what is happening with the bug though, as max-model-len (448) * max-num-seqs (2) is still below the default max_num_batched_tokens (5120).

One other important thing is that whisper max-model-len is referring to the decoder transcription length.

Comment on lines +2053 to +2058
# Ensure max_num_batched_tokens does not exceed model limit.
# Some models (e.g., Whisper) have embeddings tied to max length.
self.max_num_batched_tokens = min(
self.max_num_seqs * self.max_model_len,
self.max_num_batched_tokens)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we should only warn the user rather than silently set max_num_batched_tokens.
Also checking the limit below right after it was upper bounded here seems wasteful doesn't it?

Copy link
Contributor Author

@inkcherry inkcherry May 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!
The crash occurs during the memory profiling stage., the model performs an execution using max_num_batched_tokens/ max_num_seqs per seq, but the length may exceed the embedding position limit, FYI https://github.com/vllm-project/vllm/blob/376786fac1fc50e8d788a39a91fa28d1709ad48b/vllm/model_executor/models/whisper.py#L416C7-L416C59. Therefore, we should ensure that max_num_batched_tokens < max_model_len*num_seqs

  • for default settings, we take the minimum value to ensure safety. (Note: satisfying this clipping condition typically requires both max_num_seqs and max_model_len to be small, this does not affect the vast majority of use cases.)

  • for user-defined settings: I’ve replaced the error check with a warning instead.

inkcherry added 2 commits May 9, 2025 01:49
Signed-off-by: inkcherry <mingzhi.liu@intel.com>
Signed-off-by: inkcherry <mingzhi.liu@intel.com>
Copy link
Contributor

@NickLucche NickLucche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. This should in fact not effect other models.
I was worrying this could have an impact on a future enc-dec support for v1, but this is not today's problem. Thanks!

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) May 9, 2025 07:03
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label May 9, 2025
@DarkLight1337 DarkLight1337 merged commit 5b2dcbf into vllm-project:main May 9, 2025
69 checks passed
princepride pushed a commit to princepride/vllm that referenced this pull request May 10, 2025
…ig (vllm-project#17853)

Signed-off-by: inkcherry <mingzhi.liu@intel.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025
…ig (vllm-project#17853)

Signed-off-by: inkcherry <mingzhi.liu@intel.com>
Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
mawong-amd pushed a commit to ROCm/vllm that referenced this pull request May 14, 2025
…ig (vllm-project#17853)

Signed-off-by: inkcherry <mingzhi.liu@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Assertion error when using Whisper with --max-num-seqs
3 participants