[V1] Structured Outputs + Thinking compatibility #16577

aarnphm · 2025-04-14T07:33:48Z

This PR brings thinking support to structured outputs in V1. Currently, if you want to use thinking parser in conjunction with structured outputs, you have to use the V0 engine.

This is also compatible with speculative decoding

This PR also refactor the tokenizer onto the structured_output_manager in order to construct the reasoner.

I have also added tests to cover this case.

Tests with the following:

# thinking + structured outputs
vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --tensor-parallel-size 2 --max-model-len 32768 --guided-decoding-backend xgrammar --reasoning-parser deepseek_r1

# thinking + ngram + structured outputs
vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --tensor-parallel-size 2 --max-model-len 32768 --guided-decoding-backend xgrammar --reasoning-parser deepseek_r1 --speculative-config '{"method": "ngram", "num_speculative_tokens": 5, "prompt_lookup_max": 5, "prompt_lookup_min": 1}

Closes #14727

github-actions · 2025-04-14T07:33:58Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

aarnphm · 2025-04-14T07:34:37Z

cc @gaocegege might be interested

vllm/v1/core/sched/scheduler.py

vllm/v1/structured_output/__init__.py

gaocegege

Thanks for the PR. Please also update the docs https://docs.vllm.ai/en/latest/features/reasoning_outputs.html#structured-output

- VLLM_USE_V1=0 vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B \
    --enable-reasoning --reasoning-parser deepseek_r1
+ vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B \
    --enable-reasoning --reasoning-parser deepseek_r1

tests/v1/entrypoints/llm/test_struct_output_generate.py

mergify · 2025-04-17T16:07:40Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @aarnphm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>

…t-thinking-struct-outputs

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>

aarnphm · 2025-05-13T13:42:11Z

seems like the failure on entrypoint is not related 😿

Signed-off-by: Nathan Hoos <thwackyy.y@gmail.com>

…t-thinking-struct-outputs

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>

Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Russell Bryant <rbryant@redhat.com>

Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

Signed-off-by: Nathan Hoos <thwackyy.y@gmail.com>

aarnphm requested review from mgoin, russellb, WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac and alexm-redhat as code owners April 14, 2025 07:33

aarnphm removed request for comaniac, njhill, alexm-redhat and ywang96 April 14, 2025 07:34

mergify bot added the v1 label Apr 14, 2025

aarnphm added the structured-output label Apr 14, 2025

aarnphm mentioned this pull request Apr 14, 2025

[Feature]: reasoning outputs in structured outputs in v1 #14727

Closed

1 task

aarnphm commented Apr 14, 2025

View reviewed changes

vllm/v1/core/sched/scheduler.py Outdated Show resolved Hide resolved

aarnphm commented Apr 14, 2025

View reviewed changes

vllm/v1/structured_output/__init__.py Outdated Show resolved Hide resolved

aarnphm force-pushed the feat/support-thinking-struct-outputs branch 2 times, most recently from cbc3320 to f60e62d Compare April 14, 2025 09:11

gaocegege reviewed Apr 14, 2025

View reviewed changes

mergify bot added the documentation Improvements or additions to documentation label Apr 14, 2025

aarnphm commented Apr 14, 2025

View reviewed changes

tests/v1/entrypoints/llm/test_struct_output_generate.py Outdated Show resolved Hide resolved

aarnphm mentioned this pull request Apr 15, 2025

[Bug]: Deepseek reasoning and guided_json no longer works #16182

Closed

1 task

mergify bot added needs-rebase and removed needs-rebase labels Apr 17, 2025

aarnphm force-pushed the feat/support-thinking-struct-outputs branch 2 times, most recently from 14490de to 2cf21f0 Compare April 17, 2025 19:48

unaidedelf8777 mentioned this pull request May 12, 2025

[V0][V1][Core] Add outlines integration for V1, and update V0 integration. #15975

Open

fix: initialize default decoding_config

ffd3fa1

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>

auto-merge was automatically disabled May 13, 2025 00:52
Head branch was pushed to by a user without write access

merge: branch 'main' of github.com:vllm-project/vllm into feat/suppor…

b64f5f5

…t-thinking-struct-outputs

simon-mo enabled auto-merge (squash) May 13, 2025 04:14

aarnphm added 2 commits May 13, 2025 07:20

merge: branch 'main' of github.com:vllm-project/vllm into feat/suppor…

ddc9c47

…t-thinking-struct-outputs

chore(test): use deepseek_r1 parser for qwen3

edd235b

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>

auto-merge was automatically disabled May 13, 2025 07:22
Head branch was pushed to by a user without write access

aarnphm added 9 commits May 13, 2025 07:37

chore: separate out reasoning tests

3cbbd8c

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>

fix: reasoning tests to parse it

a559b72

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>

chore: replicate duplicate thinking budget

1f3c369

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>

revert: remove duplications

d5574be

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>

chore: reorder test logs

59f2aa7

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>

chore: keep main change to reduce diff

ded3890

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>

fix: use deepseek_r1 parser for tests

0fb92a5

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>

chore: use a slightly larger models for smarter cot

7ace2cb

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>

fix: support for qwen3 prompts

1816b3b

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>

hmellor enabled auto-merge (squash) May 13, 2025 08:36

unaidedelf8777 added a commit to unaidedelf8777/vllm that referenced this pull request May 13, 2025

fix interface to be compliant with vllm-project#16577

c3411dd

Signed-off-by: Nathan Hoos <thwackyy.y@gmail.com>

aarnphm added 2 commits May 13, 2025 21:42

merge: branch 'main' of github.com:vllm-project/vllm into feat/suppor…

91058ba

…t-thinking-struct-outputs

chore: make it more clear

d96fa45

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>

auto-merge was automatically disabled May 14, 2025 01:44
Head branch was pushed to by a user without write access

simon-mo merged commit 2fc9075 into vllm-project:main May 14, 2025
56 of 58 checks passed

github-project-automation bot moved this to Done in Structured Output May 14, 2025

aarnphm deleted the feat/support-thinking-struct-outputs branch May 14, 2025 23:41

aarnphm added a commit to aarnphm/vllm that referenced this pull request May 15, 2025

[V1] Structured Outputs + Thinking compatibility (vllm-project#16577)

9760af7

Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Russell Bryant <rbryant@redhat.com>

unaidedelf8777 added a commit to unaidedelf8777/vllm that referenced this pull request May 15, 2025

fix interface to be compliant with vllm-project#16577

5388bdc

Signed-off-by: Nathan Hoos <thwackyy.y@gmail.com>

bi1101 mentioned this pull request May 16, 2025

[Feature]: Guided decoding after thinking is done #18255

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[V1] Structured Outputs + Thinking compatibility #16577

[V1] Structured Outputs + Thinking compatibility #16577

aarnphm commented Apr 14, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Apr 14, 2025

aarnphm commented Apr 14, 2025

gaocegege left a comment

mergify bot commented Apr 17, 2025

aarnphm commented May 13, 2025

[V1] Structured Outputs + Thinking compatibility #16577

[V1] Structured Outputs + Thinking compatibility #16577

Conversation

aarnphm commented Apr 14, 2025 • edited by github-actions bot Loading

github-actions bot commented Apr 14, 2025

aarnphm commented Apr 14, 2025

gaocegege left a comment

Choose a reason for hiding this comment

mergify bot commented Apr 17, 2025

aarnphm commented May 13, 2025

aarnphm commented Apr 14, 2025 •

edited by github-actions bot

Loading