Skip to content

[Bug]: Pixtral fails when limit_mm_per_prompt not set #8382

Closed
@BabyChouSr

Description

@BabyChouSr

Your current environment

The output of `python collect_env.py`
Your output of `python collect_env.py` here

🐛 Describe the bug

The below command does not work

CUDA_VISIBLE_DEVICES=3 vllm serve mistralai/Pixtral-12B-2409 --port 21010 --max_num_batched_tokens 16384 --trust-remote-code --gpu-memory-utilization 0.50 --tokenizer_mode mistral

It leads to this error:

Traceback (most recent call last):
  File "/home/lmsys/miniconda3/envs/vllm-source/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/home/lmsys/miniconda3/envs/vllm-source/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/lmsys/vllm/vllm/entrypoints/openai/rpc/server.py", line 236, in run_rpc_server
    server = AsyncEngineRPCServer(async_engine_args, usage_context, rpc_path)
  File "/home/lmsys/vllm/vllm/entrypoints/openai/rpc/server.py", line 34, in __init__
    self.engine = AsyncLLMEngine.from_engine_args(
  File "/home/lmsys/vllm/vllm/engine/async_llm_engine.py", line 735, in from_engine_args
    engine = cls(
  File "/home/lmsys/vllm/vllm/engine/async_llm_engine.py", line 615, in __init__
    self.engine = self._init_engine(*args, **kwargs)
  File "/home/lmsys/vllm/vllm/engine/async_llm_engine.py", line 835, in _init_engine
    return engine_class(*args, **kwargs)
  File "/home/lmsys/vllm/vllm/engine/async_llm_engine.py", line 262, in __init__
    super().__init__(*args, **kwargs)
  File "/home/lmsys/vllm/vllm/engine/llm_engine.py", line 338, in __init__
    self._initialize_kv_caches()
  File "/home/lmsys/vllm/vllm/engine/llm_engine.py", line 467, in _initialize_kv_caches
    self.model_executor.determine_num_available_blocks())
  File "/home/lmsys/vllm/vllm/executor/gpu_executor.py", line 114, in determine_num_available_blocks
    return self.driver_worker.determine_num_available_blocks()
  File "/home/lmsys/miniconda3/envs/vllm-source/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/lmsys/vllm/vllm/worker/worker.py", line 223, in determine_num_available_blocks
    self.model_runner.profile_run()
  File "/home/lmsys/miniconda3/envs/vllm-source/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/lmsys/vllm/vllm/worker/model_runner.py", line 1216, in profile_run
    self.execute_model(model_input, kv_caches, intermediate_tensors)
  File "/home/lmsys/miniconda3/envs/vllm-source/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/lmsys/vllm/vllm/worker/model_runner.py", line 1543, in execute_model
    hidden_or_intermediate_states = model_executable(
  File "/home/lmsys/miniconda3/envs/vllm-source/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/lmsys/miniconda3/envs/vllm-source/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/lmsys/vllm/vllm/model_executor/models/pixtral.py", line 178, in forward
    inputs_embeds = merge_multimodal_embeddings(
  File "/home/lmsys/vllm/vllm/model_executor/models/pixtral.py", line 117, in merge_multimodal_embeddings
    assert (seq_len == N_txt +
AssertionError: seq_len 16640 should be equal to N_txt + N_img (256, 4096, 16384)

But the below works (following huggingface):

CUDA_VISIBLE_DEVICES=3 vllm serve mistralai/Pixtral-12B-2409 --port 21010 --max_num_batched_tokens 16384 --max-model-len 8192 --trust-remote-code --gpu-memory-utilization 0.50 --tokenizer_mode mistral --limit_mm_per_prompt 'image=4'

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions