@@ -10,11 +10,11 @@ Reasoning models return an additional `reasoning_content` field in their outputs
10
10
11
11
vLLM currently supports the following reasoning models:
12
12
13
- | Model Series | Parser Name | Structured Output Support | Tool Calling |
14
- | --------------| -------------| ------------------| ------------- |
15
- | [ DeepSeek R1 series] ( https://huggingface.co/collections/deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d ) | ` deepseek_r1 ` | ` guided_json ` , ` guided_regex ` | ❌ |
16
- | [ QwQ-32B] ( https://huggingface.co/Qwen/QwQ-32B ) | ` deepseek_r1 ` | ` guided_json ` , ` guided_regex ` | ✅ |
17
- | [ IBM Granite 3.2 language models] ( https://huggingface.co/collections/ibm-granite/granite-32-language-models-67b3bc8c13508f6d064cff9a ) | ` granite ` | ❌ | ❌ |
13
+ | Model Series | Parser Name | Structured Output Support | Tool Calling |
14
+ | ------------------------------------------------------------------------------------------------------------------------------------- | ------------- | ----------------------------- | ------------ |
15
+ | [ DeepSeek R1 series] ( https://huggingface.co/collections/deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d ) | ` deepseek_r1 ` | ` guided_json ` , ` guided_regex ` | ❌ |
16
+ | [ QwQ-32B] ( https://huggingface.co/Qwen/QwQ-32B ) | ` deepseek_r1 ` | ` guided_json ` , ` guided_regex ` | ✅ |
17
+ | [ IBM Granite 3.2 language models] ( https://huggingface.co/collections/ibm-granite/granite-32-language-models-67b3bc8c13508f6d064cff9a ) | ` granite ` | ❌ | ❌ |
18
18
19
19
- IBM Granite 3.2 reasoning is disabled by default; to enable it, you must also pass ` thinking=True ` in your ` chat_template_kwargs ` .
20
20
@@ -64,22 +64,22 @@ Streaming chat completions are also supported for reasoning models. The `reasoni
64
64
65
65
``` json
66
66
{
67
- "id" : " chatcmpl-123" ,
68
- "object" : " chat.completion.chunk" ,
69
- "created" : 1694268190 ,
70
- "model" : " deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B" ,
71
- "system_fingerprint" : " fp_44709d6fcb" ,
72
- "choices" : [
73
- {
74
- "index" : 0 ,
75
- "delta" : {
76
- "role" : " assistant" ,
77
- "reasoning_content" : " is" ,
78
- },
79
- "logprobs" : null ,
80
- "finish_reason" : null
81
- }
82
- ]
67
+ "id" : " chatcmpl-123" ,
68
+ "object" : " chat.completion.chunk" ,
69
+ "created" : 1694268190 ,
70
+ "model" : " deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B" ,
71
+ "system_fingerprint" : " fp_44709d6fcb" ,
72
+ "choices" : [
73
+ {
74
+ "index" : 0 ,
75
+ "delta" : {
76
+ "role" : " assistant" ,
77
+ "reasoning_content" : " is"
78
+ },
79
+ "logprobs" : null ,
80
+ "finish_reason" : null
81
+ }
82
+ ]
83
83
}
84
84
```
85
85
@@ -139,12 +139,10 @@ Remember to check whether the `reasoning_content` exists in the response before
139
139
The reasoning content is also available in the structured output. The structured output engine like ` xgrammar ` will use the reasoning content to generate structured output. It is only supported in v0 engine now.
140
140
141
141
``` bash
142
- VLLM_USE_V1=0 vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B \
142
+ vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B \
143
143
--enable-reasoning --reasoning-parser deepseek_r1
144
144
```
145
145
146
- Please note that the ` VLLM_USE_V1 ` environment variable must be set to ` 0 ` to use the v0 engine.
147
-
148
146
``` python
149
147
from openai import OpenAI
150
148
from pydantic import BaseModel
0 commit comments