Skip to content

[Misc] Update llama 3.2 template to support system prompt with images #10901

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

tjohnson31415
Copy link
Contributor

@tjohnson31415 tjohnson31415 commented Dec 4, 2024

Update the example chat template to support using a system prompt with images:

  • remove the exception raised if sending a system message and images
  • when prompting with an image, render the system message if the user supplied a prompt or if tools are requested

The coresponding change was recently made to the chat template in HF Hub:
https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct/discussions/84

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Copy link

github-actions bot commented Dec 4, 2024

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@pcuenca
Copy link

pcuenca commented Dec 4, 2024

Hi @tjohnson31415! I think the suggested template is not identical to the version we pushed to the Hub. Unless I'm mistaken, if the user provides images but no system role, the input prompt should not have a system prompt. This template, if I'm reading the diff correctly, would default to a system prompt in that situation.

For example, given this input:

messages = [
    {"role": "user", "content": [
        {"type": "image"},
        {"type": "text", "text": "If I had to write a haiku for this one, it would be: "}
    ]}
]

The desired prompt would be:

<|begin_of_text|><|start_header_id|>user<|end_header_id|>

<|image|>If I had to write a haiku for this one, it would be: <|eot_id|>

(I'm with HF and not a reviewer of this project. But I think it's important to ensure consistency in prompts across implementations, happy to revisit ours if it's in error).

Copy link
Collaborator

@Isotr0py Isotr0py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this template with examples/openai_chat_completion_client_for_multimodal.py and the prompt format looks good (there is no system prompt):

INFO:     127.0.0.1:42408 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO 12-05 03:24:17 logger.py:37] Received request chatcmpl-384a0b3b15b84d7b8a9f8dc83827a827: prompt: "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nWhat's in this image?<|image|><|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n", params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.7, top_p=1.0, top_k=-1, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=64, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None), prompt_token_ids: None, lora_request: None, prompt_adapter_request: None.

If I add system prompt "You are a helpful assistant." to messages, the request is:

INFO 12-05 03:26:58 logger.py:37] Received request chatcmpl-a3c07b7328674ab183377af2f8ce4999: prompt: "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 05 Dec 2024\n\nYou are a helpful assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nWhat's in this image?<|image|><|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n", params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.7, top_p=1.0, top_k=-1, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=64, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None), prompt_token_ids: None, lora_request: None, prompt_adapter_request: None.

@Isotr0py Isotr0py added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 5, 2024
@Isotr0py Isotr0py enabled auto-merge (squash) December 5, 2024 03:46
@Isotr0py Isotr0py merged commit 39c89e7 into vllm-project:main Dec 5, 2024
45 checks passed
@tjohnson31415 tjohnson31415 deleted the llama-vision-template-update branch December 5, 2024 07:22
@tjohnson31415
Copy link
Contributor Author

tjohnson31415 commented Dec 5, 2024

Thanks @Isotr0py and @pcuenca!

@pcuenca I agree that consistency with the chat templates would be great! I have found it difficult to find clarity on a "ground truth" for chat templates, particularly with tool calling involved.

There are a couple of things that make the vLLM template different from the HF Hub one:

  1. Handling content that is a string vs openai-style list of objects. vLLM will transform the input messages (for all roles) depending on the --chat-template-content-format configuration, or will auto-detect from the chat template. I decided to just support both formats 😅

  2. There is a default system prompt message for tool use that was developed in the PR that first added these example templates. This default prompt is why I did not include user_supplied_system_message from the HF Hub template and wrote the check as

{%- if system_message or not image_ns.has_images %}

The system message will be non-empty either if a user supplies a system message or if tools are included with the request.

@pcuenca
Copy link

pcuenca commented Dec 5, 2024

Hi @tjohnson31415, thanks for taking the time to explain! I was misled by the empty string in line 41, thinking that it would lead to a false evaluation in the if below. Perhaps we can also simplify our own template.

sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024
CatherineSue pushed a commit to moirai-internal/vllm that referenced this pull request May 5, 2025
…port system prompt with images (vllm-project#10901)

Merge in GEN/vllm from tool-use-with-image-support-fix to feature-based-on-v0.6.4.post1.c78ab524

Squashed commit of the following:

commit c78ab524b67cf0bd0ffd7dc11804c1d70682be93
Author: Travis Johnson <tsjohnso@us.ibm.com>
Date:   Wed Dec 4 22:54:06 2024 -0700

    [Misc] Update llama 3.2 template to support system prompt with images (vllm-project#10901)

    Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants