Skip to content

Commit 6f0ee8a

Browse files
maxdebayserLeiWang1999
authored andcommitted
[Feature] Add support for Llama 3.1 and 3.2 tool use (vllm-project#8343)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: LeiWang1999 <leiwang1999@outlook.com>
1 parent 3c3a704 commit 6f0ee8a

10 files changed

+576
-27
lines changed

docs/source/serving/openai_compatible_server.md

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -157,10 +157,10 @@ vLLM will use guided decoding to ensure the response matches the tool parameter
157157
To enable this feature, you should set the following flags:
158158
* `--enable-auto-tool-choice` -- **mandatory** Auto tool choice. tells vLLM that you want to enable the model to generate its own tool calls when it
159159
deems appropriate.
160-
* `--tool-call-parser` -- select the tool parser to use - currently either `hermes` or `mistral`. Additional tool parsers
160+
* `--tool-call-parser` -- select the tool parser to use - currently either `hermes`, `mistral` or `llama3_json`. Additional tool parsers
161161
will continue to be added in the future.
162162
* `--chat-template` -- **optional** for auto tool choice. the path to the chat template which handles `tool`-role messages and `assistant`-role messages
163-
that contain previously generated tool calls. Hermes and Mistral models have tool-compatible chat templates in their
163+
that contain previously generated tool calls. Hermes, Mistral and Llama models have tool-compatible chat templates in their
164164
`tokenizer_config.json` files, but you can specify a custom template. This argument can be set to `tool_use` if your model has a tool use-specific chat
165165
template configured in the `tokenizer_config.json`. In this case, it will be used per the `transformers` specification. More on this [here](https://huggingface.co/docs/transformers/en/chat_templating#why-do-some-models-have-multiple-templates)
166166
from HuggingFace; and you can find an example of this in a `tokenizer_config.json` [here](https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B/blob/main/tokenizer_config.json)
@@ -197,3 +197,25 @@ when tools are provided, that results in much better reliability when working wi
197197

198198

199199
Recommended flags: `--tool-call-parser mistral --chat-template examples/tool_chat_template_mistral_parallel.jinja`
200+
201+
#### Llama Models
202+
Supported models:
203+
* `meta-llama/Meta-Llama-3.1-8B-Instruct`
204+
* `meta-llama/Meta-Llama-3.1-70B-Instruct`
205+
* `meta-llama/Meta-Llama-3.1-405B-Instruct`
206+
* `meta-llama/Meta-Llama-3.1-405B-Instruct-FP8`
207+
208+
The tool calling that is supported is the [JSON based tool calling](https://llama.meta.com/docs/model-cards-and-prompt-formats/llama3_1/#json-based-tool-calling).
209+
Other tool calling formats like the built in python tool calling or custom tool calling are not supported.
210+
211+
Known issues:
212+
1. Parallel tool calls are not supported.
213+
2. The model can generate parameters with a wrong format, such as generating
214+
an array serialized as string instead of an array.
215+
216+
The `tool_chat_template_llama3_json.jinja` file contains the "official" Llama chat template, but tweaked so that
217+
it works better with vLLM.
218+
219+
Recommended flags: `--tool-call-parser llama3_json --chat-template examples/tool_chat_template_llama3_json.jinja`
220+
221+
Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
{{- bos_token }}
2+
{%- if custom_tools is defined %}
3+
{%- set tools = custom_tools %}
4+
{%- endif %}
5+
{%- if not tools_in_user_message is defined %}
6+
{#- Llama 3.1 doesn't pass all tests if the tools are in the system prompt #}
7+
{%- set tools_in_user_message = true %}
8+
{%- endif %}
9+
{%- if not date_string is defined %}
10+
{%- if strftime_now is defined %}
11+
{%- set date_string = strftime_now("%d %b %Y") %}
12+
{%- else %}
13+
{%- set date_string = "26 Jul 2024" %}
14+
{%- endif %}
15+
{%- endif %}
16+
{%- if not tools is defined %}
17+
{%- set tools = none %}
18+
{%- endif %}
19+
20+
{#- This block extracts the system message, so we can slot it into the right place. #}
21+
{%- if messages[0]['role'] == 'system' %}
22+
{%- set system_message = messages[0]['content']|trim %}
23+
{%- set messages = messages[1:] %}
24+
{%- else %}
25+
{%- set system_message = "You are a helpful assistant with tool calling capabilities. Only reply with a tool call if the function exists in the library provided by the user. If it doesn't exist, just reply directly in natural language. When you receive a tool call response, use the output to format an answer to the original user question." %}
26+
{%- endif %}
27+
28+
{#- System message #}
29+
{{- "<|start_header_id|>system<|end_header_id|>\n\n" }}
30+
{%- if tools is not none %}
31+
{{- "Environment: ipython\n" }}
32+
{%- endif %}
33+
{{- "Cutting Knowledge Date: December 2023\n" }}
34+
{{- "Today Date: " + date_string + "\n\n" }}
35+
{%- if tools is not none and not tools_in_user_message %}
36+
{{- "You have access to the following functions. To call a function, please respond with JSON for a function call." }}
37+
{{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.' }}
38+
{{- "Do not use variables.\n\n" }}
39+
{%- for t in tools %}
40+
{{- t | tojson(indent=4) }}
41+
{{- "\n\n" }}
42+
{%- endfor %}
43+
{%- endif %}
44+
{{- system_message }}
45+
{{- "<|eot_id|>" }}
46+
47+
{#- Custom tools are passed in a user message with some extra guidance #}
48+
{%- if tools_in_user_message and not tools is none %}
49+
{#- Extract the first user message so we can plug it in here #}
50+
{%- if messages | length != 0 %}
51+
{%- set first_user_message = messages[0]['content']|trim %}
52+
{%- set messages = messages[1:] %}
53+
{%- else %}
54+
{{- raise_exception("Cannot put tools in the first user message when there's no first user message!") }}
55+
{%- endif %}
56+
{{- '<|start_header_id|>user<|end_header_id|>\n\n' -}}
57+
{{- "Given the following functions, please respond with a JSON for a function call " }}
58+
{{- "with its proper arguments that best answers the given prompt.\n\n" }}
59+
{{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.' }}
60+
{{- "Do not use variables.\n\n" }}
61+
{%- for t in tools %}
62+
{{- t | tojson(indent=4) }}
63+
{{- "\n\n" }}
64+
{%- endfor %}
65+
{{- first_user_message + "<|eot_id|>"}}
66+
{%- endif %}
67+
68+
{%- for message in messages %}
69+
{%- if not (message.role == 'ipython' or message.role == 'tool' or 'tool_calls' in message) %}
70+
{{- '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' }}
71+
{%- elif 'tool_calls' in message %}
72+
{%- if not message.tool_calls|length == 1 %}
73+
{{- raise_exception("This model only supports single tool-calls at once!") }}
74+
{%- endif %}
75+
{%- set tool_call = message.tool_calls[0].function %}
76+
{{- '<|start_header_id|>assistant<|end_header_id|>\n\n' -}}
77+
{{- '{"name": "' + tool_call.name + '", ' }}
78+
{{- '"parameters": ' }}
79+
{{- tool_call.arguments | tojson }}
80+
{{- "}" }}
81+
{{- "<|eot_id|>" }}
82+
{%- elif message.role == "tool" or message.role == "ipython" %}
83+
{{- "<|start_header_id|>ipython<|end_header_id|>\n\n" }}
84+
{%- if message.content is mapping %}
85+
{{- message.content | tojson }}
86+
{%- else %}
87+
{{- { "output": message.content } | tojson }}
88+
{%- endif %}
89+
{{- "<|eot_id|>" }}
90+
{%- endif %}
91+
{%- endfor %}
92+
{%- if add_generation_prompt %}
93+
{{- '<|start_header_id|>assistant<|end_header_id|>\n\n' }}
94+
{%- endif %}
Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
{{- bos_token }}
2+
{%- if custom_tools is defined %}
3+
{%- set tools = custom_tools %}
4+
{%- endif %}
5+
{%- if not tools_in_user_message is defined %}
6+
{%- set tools_in_user_message = false %}
7+
{%- endif %}
8+
{%- if not date_string is defined %}
9+
{%- if strftime_now is defined %}
10+
{%- set date_string = strftime_now("%d %b %Y") %}
11+
{%- else %}
12+
{%- set date_string = "26 Jul 2024" %}
13+
{%- endif %}
14+
{%- endif %}
15+
{%- if not tools is defined %}
16+
{%- set tools = none %}
17+
{%- endif %}
18+
19+
{#- This block extracts the system message, so we can slot it into the right place. #}
20+
{%- if messages[0]['role'] == 'system' %}
21+
{%- set system_message = messages[0]['content']|trim %}
22+
{%- set messages = messages[1:] %}
23+
{%- else %}
24+
{%- set system_message = "You are a helpful assistant with tool calling capabilities. Only reply with a tool call if the function exists in the library provided by the user. If it doesn't exist, just reply directly in natural language. When you receive a tool call response, use the output to format an answer to the original user question." %}
25+
{%- endif %}
26+
27+
{#- System message #}
28+
{{- "<|start_header_id|>system<|end_header_id|>\n\n" }}
29+
{%- if tools is not none %}
30+
{{- "Environment: ipython\n" }}
31+
{%- endif %}
32+
{{- "Cutting Knowledge Date: December 2023\n" }}
33+
{{- "Today Date: " + date_string + "\n\n" }}
34+
{%- if tools is not none and not tools_in_user_message %}
35+
{{- "You have access to the following functions. To call a function, please respond with JSON for a function call." }}
36+
{{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.' }}
37+
{{- "Do not use variables.\n\n" }}
38+
{%- for t in tools %}
39+
{{- t | tojson(indent=4) }}
40+
{{- "\n\n" }}
41+
{%- endfor %}
42+
{%- endif %}
43+
{{- system_message }}
44+
{{- "<|eot_id|>" }}
45+
46+
{#- Custom tools are passed in a user message with some extra guidance #}
47+
{%- if tools_in_user_message and not tools is none %}
48+
{#- Extract the first user message so we can plug it in here #}
49+
{%- if messages | length != 0 %}
50+
{%- set first_user_message = messages[0]['content']|trim %}
51+
{%- set messages = messages[1:] %}
52+
{%- else %}
53+
{{- raise_exception("Cannot put tools in the first user message when there's no first user message!") }}
54+
{%- endif %}
55+
{{- '<|start_header_id|>user<|end_header_id|>\n\n' -}}
56+
{{- "Given the following functions, please respond with a JSON for a function call " }}
57+
{{- "with its proper arguments that best answers the given prompt.\n\n" }}
58+
{{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.' }}
59+
{{- "Do not use variables.\n\n" }}
60+
{%- for t in tools %}
61+
{{- t | tojson(indent=4) }}
62+
{{- "\n\n" }}
63+
{%- endfor %}
64+
{{- first_user_message + "<|eot_id|>"}}
65+
{%- endif %}
66+
67+
{%- for message in messages %}
68+
{%- if not (message.role == 'ipython' or message.role == 'tool' or 'tool_calls' in message) %}
69+
{{- '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' }}
70+
{%- elif 'tool_calls' in message %}
71+
{%- if not message.tool_calls|length == 1 %}
72+
{{- raise_exception("This model only supports single tool-calls at once!") }}
73+
{%- endif %}
74+
{%- set tool_call = message.tool_calls[0].function %}
75+
{{- '<|start_header_id|>assistant<|end_header_id|>\n\n' -}}
76+
{{- '{"name": "' + tool_call.name + '", ' }}
77+
{{- '"parameters": ' }}
78+
{{- tool_call.arguments | tojson }}
79+
{{- "}" }}
80+
{{- "<|eot_id|>" }}
81+
{%- elif message.role == "tool" or message.role == "ipython" %}
82+
{{- "<|start_header_id|>ipython<|end_header_id|>\n\n" }}
83+
{%- if message.content is mapping %}
84+
{{- message.content | tojson }}
85+
{%- else %}
86+
{{- { "output": message.content } | tojson }}
87+
{%- endif %}
88+
{{- "<|eot_id|>" }}
89+
{%- endif %}
90+
{%- endfor %}
91+
{%- if add_generation_prompt %}
92+
{{- '<|start_header_id|>assistant<|end_header_id|>\n\n' }}
93+
{%- endif %}

tests/tool_use/test_chat_completions.py

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,18 +3,20 @@
33
import openai
44
import pytest
55

6-
from .utils import MESSAGES_WITHOUT_TOOLS, WEATHER_TOOL
6+
from .utils import (MESSAGES_WITHOUT_TOOLS, WEATHER_TOOL, ServerConfig,
7+
ensure_system_prompt)
78

89

910
# test: make sure chat completions without tools provided work even when tools
1011
# are enabled. This makes sure tool call chat templates work, AND that the tool
1112
# parser stream processing doesn't change the output of the model.
1213
@pytest.mark.asyncio
13-
async def test_chat_completion_without_tools(client: openai.AsyncOpenAI):
14+
async def test_chat_completion_without_tools(client: openai.AsyncOpenAI,
15+
server_config: ServerConfig):
1416
models = await client.models.list()
1517
model_name: str = models.data[0].id
1618
chat_completion = await client.chat.completions.create(
17-
messages=MESSAGES_WITHOUT_TOOLS,
19+
messages=ensure_system_prompt(MESSAGES_WITHOUT_TOOLS, server_config),
1820
temperature=0,
1921
max_tokens=150,
2022
model=model_name,
@@ -34,7 +36,7 @@ async def test_chat_completion_without_tools(client: openai.AsyncOpenAI):
3436

3537
# make the same request, streaming
3638
stream = await client.chat.completions.create(
37-
messages=MESSAGES_WITHOUT_TOOLS,
39+
messages=ensure_system_prompt(MESSAGES_WITHOUT_TOOLS, server_config),
3840
temperature=0,
3941
max_tokens=150,
4042
model=model_name,
@@ -77,11 +79,12 @@ async def test_chat_completion_without_tools(client: openai.AsyncOpenAI):
7779
# tools, to make sure we can still get normal chat completion responses
7880
# and that they won't be parsed as tools
7981
@pytest.mark.asyncio
80-
async def test_chat_completion_with_tools(client: openai.AsyncOpenAI):
82+
async def test_chat_completion_with_tools(client: openai.AsyncOpenAI,
83+
server_config: ServerConfig):
8184
models = await client.models.list()
8285
model_name: str = models.data[0].id
8386
chat_completion = await client.chat.completions.create(
84-
messages=MESSAGES_WITHOUT_TOOLS,
87+
messages=ensure_system_prompt(MESSAGES_WITHOUT_TOOLS, server_config),
8588
temperature=0,
8689
max_tokens=150,
8790
model=model_name,
@@ -102,7 +105,7 @@ async def test_chat_completion_with_tools(client: openai.AsyncOpenAI):
102105

103106
# make the same request, streaming
104107
stream = await client.chat.completions.create(
105-
messages=MESSAGES_WITHOUT_TOOLS,
108+
messages=ensure_system_prompt(MESSAGES_WITHOUT_TOOLS, server_config),
106109
temperature=0,
107110
max_tokens=150,
108111
model=model_name,

tests/tool_use/test_parallel_tool_calls.py

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,15 +6,21 @@
66

77
from .utils import (MESSAGES_ASKING_FOR_PARALLEL_TOOLS,
88
MESSAGES_WITH_PARALLEL_TOOL_RESPONSE, SEARCH_TOOL,
9-
WEATHER_TOOL)
9+
WEATHER_TOOL, ServerConfig)
1010

1111

1212
# test: getting the model to generate parallel tool calls (streaming/not)
1313
# when requested. NOTE that not all models may support this, so some exclusions
1414
# may be added in the future. e.g. llama 3.1 models are not designed to support
1515
# parallel tool calls.
1616
@pytest.mark.asyncio
17-
async def test_parallel_tool_calls(client: openai.AsyncOpenAI):
17+
async def test_parallel_tool_calls(client: openai.AsyncOpenAI,
18+
server_config: ServerConfig):
19+
20+
if not server_config.get("supports_parallel", True):
21+
pytest.skip("The {} model doesn't support parallel tool calls".format(
22+
server_config["model"]))
23+
1824
models = await client.models.list()
1925
model_name: str = models.data[0].id
2026
chat_completion = await client.chat.completions.create(
@@ -136,7 +142,13 @@ async def test_parallel_tool_calls(client: openai.AsyncOpenAI):
136142
# test: providing parallel tool calls back to the model to get a response
137143
# (streaming/not)
138144
@pytest.mark.asyncio
139-
async def test_parallel_tool_calls_with_results(client: openai.AsyncOpenAI):
145+
async def test_parallel_tool_calls_with_results(client: openai.AsyncOpenAI,
146+
server_config: ServerConfig):
147+
148+
if not server_config.get("supports_parallel", True):
149+
pytest.skip("The {} model doesn't support parallel tool calls".format(
150+
server_config["model"]))
151+
140152
models = await client.models.list()
141153
model_name: str = models.data[0].id
142154
chat_completion = await client.chat.completions.create(

0 commit comments

Comments
 (0)