Fix Whisper crash caused by invalid `max_num_batched_tokens` config #17853

inkcherry · 2025-05-08T10:51:20Z

github-actions · 2025-05-08T10:51:28Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

DarkLight1337 · 2025-05-08T10:58:51Z

This looks reasonable to me, but pinging @NickLucche @ywang96 just to be sure.

Signed-off-by: inkcherry <mingzhi.liu@intel.com>

NickLucche

Thanks for looking into! Left one comment.
I am not sure this is what is happening with the bug though, as max-model-len (448) * max-num-seqs (2) is still below the default max_num_batched_tokens (5120).

One other important thing is that whisper max-model-len is referring to the decoder transcription length.

NickLucche · 2025-05-08T11:12:40Z

vllm/config.py

+            # Ensure max_num_batched_tokens does not exceed model limit.
+            # Some models (e.g., Whisper) have embeddings tied to max length.
+            self.max_num_batched_tokens = min(
+                self.max_num_seqs * self.max_model_len,
+                self.max_num_batched_tokens)
+


I feel like we should only warn the user rather than silently set max_num_batched_tokens.
Also checking the limit below right after it was upper bounded here seems wasteful doesn't it?

Thanks for the review!
The crash occurs during the memory profiling stage., the model performs an execution using max_num_batched_tokens/ max_num_seqs per seq, but the length may exceed the embedding position limit, FYI https://github.com/vllm-project/vllm/blob/376786fac1fc50e8d788a39a91fa28d1709ad48b/vllm/model_executor/models/whisper.py#L416C7-L416C59. Therefore, we should ensure that max_num_batched_tokens < max_model_len*num_seqs

for default settings, we take the minimum value to ensure safety. (Note: satisfying this clipping condition typically requires both max_num_seqs and max_model_len to be small, this does not affect the vast majority of use cases.)

for user-defined settings: I’ve replaced the error check with a warning instead.

Signed-off-by: inkcherry <mingzhi.liu@intel.com>

NickLucche

LGTM. This should in fact not effect other models.
I was worrying this could have an impact on a future enc-dec support for v1, but this is not today's problem. Thanks!

…ig (vllm-project#17853) Signed-off-by: inkcherry <mingzhi.liu@intel.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

…ig (vllm-project#17853) Signed-off-by: inkcherry <mingzhi.liu@intel.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>

…ig (vllm-project#17853) Signed-off-by: inkcherry <mingzhi.liu@intel.com>

inkcherry added 2 commits May 8, 2025 11:04

update

fb2e6ef

Signed-off-by: inkcherry <mingzhi.liu@intel.com>

update comments

6c3561c

Signed-off-by: inkcherry <mingzhi.liu@intel.com>

inkcherry force-pushed the fix_whisper branch from 62710d9 to 6c3561c Compare May 8, 2025 11:05

inkcherry mentioned this pull request May 8, 2025

[Bug]: Assertion error when using Whisper with --max-num-seqs #17797

Closed

1 task

NickLucche suggested changes May 8, 2025

View reviewed changes

inkcherry added 2 commits May 9, 2025 01:49

update

b7b790c

Signed-off-by: inkcherry <mingzhi.liu@intel.com>

format

b384745

Signed-off-by: inkcherry <mingzhi.liu@intel.com>

NickLucche approved these changes May 9, 2025

View reviewed changes

DarkLight1337 approved these changes May 9, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) May 9, 2025 07:03

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label May 9, 2025

DarkLight1337 merged commit 5b2dcbf into vllm-project:main May 9, 2025
69 checks passed

mawong-amd pushed a commit to ROCm/vllm that referenced this pull request May 14, 2025

Fix Whisper crash caused by invalid max_num_batched_tokens conf…

3873827

…ig (vllm-project#17853) Signed-off-by: inkcherry <mingzhi.liu@intel.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Whisper crash caused by invalid `max_num_batched_tokens` config #17853

Fix Whisper crash caused by invalid `max_num_batched_tokens` config #17853

inkcherry commented May 8, 2025 •

edited by github-actions bot

Loading

github-actions bot commented May 8, 2025

DarkLight1337 commented May 8, 2025

NickLucche left a comment

NickLucche May 8, 2025

inkcherry May 9, 2025 •

edited

Loading

NickLucche left a comment

Fix Whisper crash caused by invalid max_num_batched_tokens config #17853

Fix Whisper crash caused by invalid max_num_batched_tokens config #17853

Conversation

inkcherry commented May 8, 2025 • edited by github-actions bot Loading

github-actions bot commented May 8, 2025

DarkLight1337 commented May 8, 2025

NickLucche left a comment

Choose a reason for hiding this comment

NickLucche May 8, 2025

Choose a reason for hiding this comment

inkcherry May 9, 2025 • edited Loading

Choose a reason for hiding this comment

NickLucche left a comment

Choose a reason for hiding this comment

Fix Whisper crash caused by invalid `max_num_batched_tokens` config #17853

Fix Whisper crash caused by invalid `max_num_batched_tokens` config #17853

inkcherry commented May 8, 2025 •

edited by github-actions bot

Loading

inkcherry May 9, 2025 •

edited

Loading