Skip to content

Commit e02ce49

Browse files
K-Misteleconstellate
and
constellate
authored
[Feature] OpenAI-Compatible Tools API + Streaming for Hermes & Mistral models (#5649)
Co-authored-by: constellate <constellate@1-ai-appserver-staging.codereach.com> Co-authored-by: Kyle Mistele <kyle@constellate.ai>
1 parent 561d6f8 commit e02ce49

26 files changed

+2588
-83
lines changed

.buildkite/test-pipeline.yaml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,7 @@ steps:
9292
- pytest -v -s entrypoints/openai
9393
- pytest -v -s entrypoints/test_chat_utils.py
9494

95+
9596
- label: Distributed Tests (4 GPUs) # 10min
9697
working_dir: "/vllm-workspace/tests"
9798
num_gpus: 4
@@ -271,6 +272,15 @@ steps:
271272
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
272273
- bash ./run-tests.sh -c configs/models-small.txt -t 1
273274

275+
- label: OpenAI-Compatible Tool Use # 20 min
276+
fast_check: false
277+
mirror_hardwares: [ amd ]
278+
source_file_dependencies:
279+
- vllm/
280+
- tests/tool_use
281+
commands:
282+
- pytest -v -s tool_use
283+
274284
##### 1 GPU test #####
275285
##### multi gpus test #####
276286

docs/source/serving/openai_compatible_server.md

Lines changed: 54 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,14 @@ directory [here](https://github.com/vllm-project/vllm/tree/main/examples/)
110110
:func: create_parser_for_docs
111111
:prog: vllm serve
112112
```
113+
## Tool Calling in the Chat Completion API
114+
### Named Function Calling
115+
vLLM supports only named function calling in the chat completion API by default. It does so using Outlines, so this is
116+
enabled by default, and will work with any supported model. You are guaranteed a validly-parsable function call - not a
117+
high-quality one.
118+
119+
To use a named function, you need to define the functions in the `tools` parameter of the chat completion request, and
120+
specify the `name` of one of the tools in the `tool_choice` parameter of the chat completion request.
113121

114122
### Config file
115123

@@ -140,10 +148,52 @@ The order of priorities is `command line > config file values > defaults`.
140148
## Tool calling in the chat completion API
141149
vLLM supports only named function calling in the chat completion API. The `tool_choice` options `auto` and `required` are **not yet supported** but on the roadmap.
142150

143-
To use a named function you need to define the function in the `tools` parameter and call it in the `tool_choice` parameter.
144-
145-
It is the callers responsibility to prompt the model with the tool information, vLLM will not automatically manipulate the prompt. **This may change in the future.**
151+
It is the callers responsibility to prompt the model with the tool information, vLLM will not automatically manipulate the prompt.
146152

147153
vLLM will use guided decoding to ensure the response matches the tool parameter object defined by the JSON schema in the `tools` parameter.
148154

149-
Please refer to the OpenAI API reference documentation for more information.
155+
156+
### Automatic Function Calling
157+
To enable this feature, you should set the following flags:
158+
* `--enable-auto-tool-choice` -- **mandatory** Auto tool choice. tells vLLM that you want to enable the model to generate its own tool calls when it
159+
deems appropriate.
160+
* `--tool-call-parser` -- select the tool parser to use - currently either `hermes` or `mistral`. Additional tool parsers
161+
will continue to be added in the future.
162+
* `--chat-template` -- **optional** for auto tool choice. the path to the chat template which handles `tool`-role messages and `assistant`-role messages
163+
that contain previously generated tool calls. Hermes and Mistral models have tool-compatible chat templates in their
164+
`tokenizer_config.json` files, but you can specify a custom template. This argument can be set to `tool_use` if your model has a tool use-specific chat
165+
template configured in the `tokenizer_config.json`. In this case, it will be used per the `transformers` specification. More on this [here](https://huggingface.co/docs/transformers/en/chat_templating#why-do-some-models-have-multiple-templates)
166+
from HuggingFace; and you can find an example of this in a `tokenizer_config.json` [here](https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B/blob/main/tokenizer_config.json)
167+
168+
If your favorite tool-calling model is not supported, please feel free to contribute a parser & tool use chat template!
169+
170+
#### Hermes Models
171+
All Nous Research Hermes-series models newer than Hermes 2 Pro should be supported.
172+
* `NousResearch/Hermes-2-Pro-*`
173+
* `NousResearch/Hermes-2-Theta-*`
174+
* `NousResearch/Hermes-3-*`
175+
176+
177+
_Note that the Hermes 2 **Theta** models are known to have degraded tool call quality & capabilities due to the merge
178+
step in their creation_.
179+
180+
Flags: `--tool-call-parser hermes`
181+
182+
#### Mistral Models
183+
Supported models:
184+
* `mistralai/Mistral-7B-Instruct-v0.3` (confirmed)
185+
* Additional mistral function-calling models are compatible as well.
186+
187+
Known issues:
188+
1. Mistral 7B struggles to generate parallel tool calls correctly.
189+
2. Mistral's `tokenizer_config.json` chat template requires tool call IDs that are exactly 9 digits, which is
190+
much shorter than what vLLM generates. Since an exception is thrown when this condition
191+
is not met, the following additional chat templates are provided:
192+
193+
* `examples/tool_chat_template_mistral.jinja` - this is the "official" Mistral chat template, but tweaked so that
194+
it works with vLLM's tool call IDs (provided `tool_call_id` fields are truncated to the last 9 digits)
195+
* `examples/tool_chat_template_mistral_parallel.jinja` - this is a "better" version that adds a tool-use system prompt
196+
when tools are provided, that results in much better reliability when working with parallel tool calling.
197+
198+
199+
Recommended flags: `--tool-call-parser mistral --chat-template examples/tool_chat_template_mistral_parallel.jinja`
Lines changed: 162 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,162 @@
1+
"""
2+
Set up this example by starting a vLLM OpenAI-compatible server with tool call
3+
options enabled. For example:
4+
5+
IMPORTANT: for mistral, you must use one of the provided mistral tool call
6+
templates, or your own - the model default doesn't work for tool calls with vLLM
7+
See the vLLM docs on OpenAI server & tool calling for more details.
8+
9+
vllm serve --model mistralai/Mistral-7B-Instruct-v0.3 \
10+
--chat-template examples/tool_chat_template_mistral.jinja \
11+
--enable-auto-tool-choice --tool-call-parser mistral
12+
13+
OR
14+
vllm serve --model NousResearch/Hermes-2-Pro-Llama-3-8B \
15+
--chat-template examples/tool_chat_template_hermes.jinja \
16+
--enable-auto-tool-choice --tool-call-parser hermes
17+
"""
18+
import json
19+
20+
from openai import OpenAI
21+
22+
# Modify OpenAI's API key and API base to use vLLM's API server.
23+
openai_api_key = "EMPTY"
24+
openai_api_base = "http://localhost:8000/v1"
25+
26+
client = OpenAI(
27+
# defaults to os.environ.get("OPENAI_API_KEY")
28+
api_key=openai_api_key,
29+
base_url=openai_api_base,
30+
)
31+
32+
models = client.models.list()
33+
model = models.data[0].id
34+
35+
tools = [{
36+
"type": "function",
37+
"function": {
38+
"name": "get_current_weather",
39+
"description": "Get the current weather in a given location",
40+
"parameters": {
41+
"type": "object",
42+
"properties": {
43+
"city": {
44+
"type":
45+
"string",
46+
"description":
47+
"The city to find the weather for, e.g. 'San Francisco'"
48+
},
49+
"state": {
50+
"type":
51+
"string",
52+
"description":
53+
"the two-letter abbreviation for the state that the city is"
54+
" in, e.g. 'CA' which would mean 'California'"
55+
},
56+
"unit": {
57+
"type": "string",
58+
"description": "The unit to fetch the temperature in",
59+
"enum": ["celsius", "fahrenheit"]
60+
}
61+
},
62+
"required": ["city", "state", "unit"]
63+
}
64+
}
65+
}]
66+
67+
messages = [{
68+
"role": "user",
69+
"content": "Hi! How are you doing today?"
70+
}, {
71+
"role": "assistant",
72+
"content": "I'm doing well! How can I help you?"
73+
}, {
74+
"role":
75+
"user",
76+
"content":
77+
"Can you tell me what the temperate will be in Dallas, in fahrenheit?"
78+
}]
79+
80+
chat_completion = client.chat.completions.create(messages=messages,
81+
model=model,
82+
tools=tools)
83+
84+
print("Chat completion results:")
85+
print(chat_completion)
86+
print("\n\n")
87+
88+
tool_calls_stream = client.chat.completions.create(messages=messages,
89+
model=model,
90+
tools=tools,
91+
stream=True)
92+
93+
chunks = []
94+
for chunk in tool_calls_stream:
95+
chunks.append(chunk)
96+
if chunk.choices[0].delta.tool_calls:
97+
print(chunk.choices[0].delta.tool_calls[0])
98+
else:
99+
print(chunk.choices[0].delta)
100+
101+
arguments = []
102+
tool_call_idx = -1
103+
for chunk in chunks:
104+
105+
if chunk.choices[0].delta.tool_calls:
106+
tool_call = chunk.choices[0].delta.tool_calls[0]
107+
108+
if tool_call.index != tool_call_idx:
109+
if tool_call_idx >= 0:
110+
print(
111+
f"streamed tool call arguments: {arguments[tool_call_idx]}"
112+
)
113+
tool_call_idx = chunk.choices[0].delta.tool_calls[0].index
114+
arguments.append("")
115+
if tool_call.id:
116+
print(f"streamed tool call id: {tool_call.id} ")
117+
118+
if tool_call.function:
119+
if tool_call.function.name:
120+
print(f"streamed tool call name: {tool_call.function.name}")
121+
122+
if tool_call.function.arguments:
123+
arguments[tool_call_idx] += tool_call.function.arguments
124+
125+
if len(arguments):
126+
print(f"streamed tool call arguments: {arguments[-1]}")
127+
128+
print("\n\n")
129+
130+
messages.append({
131+
"role": "assistant",
132+
"tool_calls": chat_completion.choices[0].message.tool_calls
133+
})
134+
135+
136+
# Now, simulate a tool call
137+
def get_current_weather(city: str, state: str, unit: 'str'):
138+
return ("The weather in Dallas, Texas is 85 degrees fahrenheit. It is "
139+
"partly cloudly, with highs in the 90's.")
140+
141+
142+
available_tools = {"get_current_weather": get_current_weather}
143+
144+
completion_tool_calls = chat_completion.choices[0].message.tool_calls
145+
for call in completion_tool_calls:
146+
tool_to_call = available_tools[call.function.name]
147+
args = json.loads(call.function.arguments)
148+
result = tool_to_call(**args)
149+
print(result)
150+
messages.append({
151+
"role": "tool",
152+
"content": result,
153+
"tool_call_id": call.id,
154+
"name": call.function.name
155+
})
156+
157+
chat_completion_2 = client.chat.completions.create(messages=messages,
158+
model=model,
159+
tools=tools,
160+
stream=False)
161+
print("\n\n")
162+
print(chat_completion_2)
Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
{%- macro json_to_python_type(json_spec) %}
2+
{%- set basic_type_map = {
3+
"string": "str",
4+
"number": "float",
5+
"integer": "int",
6+
"boolean": "bool"
7+
} %}
8+
9+
{%- if basic_type_map[json_spec.type] is defined %}
10+
{{- basic_type_map[json_spec.type] }}
11+
{%- elif json_spec.type == "array" %}
12+
{{- "list[" + json_to_python_type(json_spec|items) + "]" }}
13+
{%- elif json_spec.type == "object" %}
14+
{%- if json_spec.additionalProperties is defined %}
15+
{{- "dict[str, " + json_to_python_type(json_spec.additionalProperties) + ']' }}
16+
{%- else %}
17+
{{- "dict" }}
18+
{%- endif %}
19+
{%- elif json_spec.type is iterable %}
20+
{{- "Union[" }}
21+
{%- for t in json_spec.type %}
22+
{{- json_to_python_type({"type": t}) }}
23+
{%- if not loop.last %}
24+
{{- "," }}
25+
{%- endif %}
26+
{%- endfor %}
27+
{{- "]" }}
28+
{%- else %}
29+
{{- "Any" }}
30+
{%- endif %}
31+
{%- endmacro %}
32+
33+
34+
{{- bos_token }}
35+
{{- "<|im_start|>system\nYou are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools: <tools> " }}
36+
{%- if tools is iterable and tools | length > 0 %}
37+
{%- for tool in tools %}
38+
{%- if tool.function is defined %}
39+
{%- set tool = tool.function %}
40+
{%- endif %}
41+
{{- '{"type": "function", "function": ' }}
42+
{{- '{"name": "' + tool.name + '", ' }}
43+
{{- '"description": "' + tool.name + '(' }}
44+
{%- for param_name, param_fields in tool.parameters.properties|items %}
45+
{{- param_name + ": " + json_to_python_type(param_fields) }}
46+
{%- if not loop.last %}
47+
{{- ", " }}
48+
{%- endif %}
49+
{%- endfor %}
50+
{{- ")" }}
51+
{%- if tool.return is defined %}
52+
{{- " -> " + json_to_python_type(tool.return) }}
53+
{%- endif %}
54+
{{- " - " + tool.description + "\n\n" }}
55+
{%- for param_name, param_fields in tool.parameters.properties|items %}
56+
{%- if loop.first %}
57+
{{- " Args:\n" }}
58+
{%- endif %}
59+
{{- " " + param_name + "(" + json_to_python_type(param_fields) + "): " + param_fields.description|trim }}
60+
{%- endfor %}
61+
{%- if tool.return is defined and tool.return.description is defined %}
62+
{{- "\n Returns:\n " + tool.return.description }}
63+
{%- endif %}
64+
{{- '"' }}
65+
{{- ', "parameters": ' }}
66+
{%- if tool.parameters.properties | length == 0 %}
67+
{{- "{}" }}
68+
{%- else %}
69+
{{- tool.parameters|tojson }}
70+
{%- endif %}
71+
{{- "}" }}
72+
{%- if not loop.last %}
73+
{{- "\n" }}
74+
{%- endif %}
75+
{%- endfor %}
76+
{%- endif %}
77+
{{- " </tools>" }}
78+
{{- 'Use the following pydantic model json schema for each tool call you will make: {"properties": {"name": {"title": "Name", "type": "string"}, "arguments": {"title": "Arguments", "type": "object"}}, "required": ["name", "arguments"], "title": "FunctionCall", "type": "object"}}
79+
' }}
80+
{{- "For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
81+
" }}
82+
{{- "<tool_call>
83+
" }}
84+
{{- '{"name": <function-name>, "arguments": <args-dict>}
85+
' }}
86+
{{- '</tool_call><|im_end|>' }}
87+
{%- for message in messages %}
88+
{%- if message.role == "user" or message.role == "system" or (message.role == "assistant" and message.tool_calls is not defined) %}
89+
{{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
90+
{%- elif message.role == "assistant" and message.tool_calls is defined %}
91+
{{- '<|im_start|>' + message.role }}
92+
{%- for tool_call in message.tool_calls %}
93+
{{- '\n<tool_call>\n' }}
94+
{%- if tool_call.function is defined %}
95+
{%- set tool_call = tool_call.function %}
96+
{%- endif %}
97+
{{- '{' }}
98+
{{- '"name": "' }}
99+
{{- tool_call.name }}
100+
{{- '"}' }}
101+
{{- ', ' }}
102+
{%- if tool_call.arguments is defined %}
103+
{{- '"arguments": ' }}
104+
{{- tool_call.arguments|tojson }}
105+
{%- endif %}
106+
{{- '\n</tool_call>' }}
107+
{%- endfor %}
108+
{{- '<|im_end|>\n' }}
109+
{%- elif message.role == "tool" %}
110+
{%- if loop.previtem and loop.previtem.role != "tool" %}
111+
{{- '<|im_start|>tool\n' }}
112+
{%- endif %}
113+
{{- '<tool_response>\n' }}
114+
{{- message.content }}
115+
{%- if not loop.last %}
116+
{{- '\n</tool_response>\n' }}
117+
{%- else %}
118+
{{- '\n</tool_response>' }}
119+
{%- endif %}
120+
{%- if not loop.last and loop.nextitem.role != "tool" %}
121+
{{- '<|im_end|>' }}
122+
{%- elif loop.last %}
123+
{{- '<|im_end|>' }}
124+
{%- endif %}
125+
{%- endif %}
126+
{%- endfor %}
127+
{%- if add_generation_prompt %}
128+
{{- '<|im_start|>assistant\n' }}
129+
{%- endif %}

0 commit comments

Comments
 (0)