Skip to content

Error when computing device_map for Mistral-small-3.1-24B-Instruct-2503 #1403

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
VAmblardPEReN opened this issue Apr 30, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@VAmblardPEReN
Copy link

VAmblardPEReN commented Apr 30, 2025

Describe the bug
The method calculate_offload_device_map fails with Mistral-Small-3.1-24B-Instruct-2503 due to a ValueError "Could not find targets ['Mistral3VisionAttention'] in module Mistral3ForConditionalGeneration".
(I was able to trace back the error and it comes from the method match_layers_params called with targets='Mistral3VisionAttention').

Expected behavior
I would expect the calculate_offload_device_map to output a device_map.

Environment
Include all relevant environment information:

  1. OS : Debian 12
  2. Python version : 3.11
  3. LLM Compressor version or commit hash: 0.5.1
  4. ML framework version(s) [e.g. torch 2.3.1]: torch : 2.7.0+cu118
  5. Other Python package versions [e.g. vLLM, compressed-tensors, numpy, ONNX]: compressed-tensors: 0.9.4, transformers: 4.51.3
  6. Other relevant environment information [e.g. hardware, CUDA version]: Cuda 11.8

To Reproduce

from llmcompressor.transformers.compression.helpers import calculate_offload_device_map
from transformers import AutoModelForImageTextToText

model_name = 'mistralai/Mistral-Small-3.1-24B-Instruct-2503'
device_map = calculate_offload_device_map(
    model_name, num_gpus=3, reserve_for_hessians=True, torch_dtype=torch.bfloat16, model_cls=AutoModelForImageTextToText
)

Errors

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[5], line 5
      2 from transformers import AutoModelForImageTextToText, AutoTokenizer, AutoModelForCausalLM
      3 import torch
----> 5 device_map = calculate_offload_device_map(
      6     model_name, num_gpus=3, reserve_for_hessians=True, torch_dtype=torch.bfloat16, model_cls=AutoModelForImageTextToText
      7 )

File [<PATH>/site-packages/llmcompressor/transformers/compression/helpers.py:245](<PATH>/site-packages/llmcompressor/transformers/compression/helpers.py#line=244), in calculate_offload_device_map(model_stub, reserve_for_hessians, num_gpus, torch_dtype, model_cls, **model_kwargs)
    243 reserved_memory = 0
    244 if reserve_for_hessians:
--> 245     reserved_memory = hessian_memory_requirements(dummy_model)
    246 reserved_memory += quantization_memory_requirement(dummy_model)
    248 memory_limits = {
    249     idx: (max_memory - reserved_memory)
    250     for idx, max_memory in enumerate(max_gpu_memory)
    251 }

File [<PATH>/site-packages/llmcompressor/transformers/compression/helpers.py:123](<PATH>/site-packages/llmcompressor/transformers/compression/helpers.py#line=122), in hessian_memory_requirements(model)
    114 def hessian_memory_requirements(model: torch.nn.Module) -> int:
    115     """
    116     Determines the number of bytes needed to store Hessian data for a single
    117     transformer layer in model. This is used for reserving memory for GPTQ
   (...)    121     :return: number of bytes required to reserve for GPTQ on a single layer
    122     """
--> 123     transformer_layers = get_layers(get_no_split_params(model), model)
    124     total_hessian_elems = {}
    125     max_column_size = {}

File [<PATH>/site-packages/llmcompressor/utils/pytorch/module.py:168](<PATH>/site-packages/llmcompressor/utils/pytorch/module.py#line=167), in get_layers(targets, module)
    167 def get_layers(targets: Union[str, List[str]], module: Module) -> Dict[str, Module]:
--> 168     return match_layers_params(targets, module)

File [<PATH>/site-packages/llmcompressor/utils/pytorch/module.py:162](<PATH>/site-packages/llmcompressor/utils/pytorch/module.py#line=161), in match_layers_params(targets, module, params)
    160 missed = [target for found, target in zip(targets_found, targets) if not found]
    161 if len(missed) > 0:
--> 162     raise ValueError(f"Could not find targets {missed} in module {module}")
    164 return resolved

ValueError: Could not find targets ['Mistral3VisionAttention'] in module Mistral3ForConditionalGeneration(
  (vision_tower): PixtralVisionModel(
    (patch_conv): Conv2d(3, 1024, kernel_size=(14, 14), stride=(14, 14), bias=False)
    (ln_pre): PixtralRMSNorm((1024,), eps=1e-05)
    (transformer): PixtralTransformer(
      (layers): ModuleList(
        (0-23): 24 x PixtralAttentionLayer(
          (attention_norm): PixtralRMSNorm((1024,), eps=1e-05)
          (feed_forward): PixtralMLP(
            (gate_proj): Linear(in_features=1024, out_features=4096, bias=False)
            (up_proj): Linear(in_features=1024, out_features=4096, bias=False)
            (down_proj): Linear(in_features=4096, out_features=1024, bias=False)
            (act_fn): GELUActivation()
          )
          (attention): PixtralAttention(
            (k_proj): Linear(in_features=1024, out_features=1024, bias=False)
            (v_proj): Linear(in_features=1024, out_features=1024, bias=False)
            (q_proj): Linear(in_features=1024, out_features=1024, bias=False)
            (o_proj): Linear(in_features=1024, out_features=1024, bias=False)
          )
          (ffn_norm): PixtralRMSNorm((1024,), eps=1e-05)
        )
      )
    )
    (patch_positional_embedding): PixtralRotaryEmbedding()
  )
  (multi_modal_projector): Mistral3MultiModalProjector(
    (norm): Mistral3RMSNorm((1024,), eps=1e-06)
    (patch_merger): Mistral3PatchMerger(
      (merging_layer): Linear(in_features=4096, out_features=1024, bias=False)
    )
    (linear_1): Linear(in_features=1024, out_features=5120, bias=False)
    (act): GELUActivation()
    (linear_2): Linear(in_features=5120, out_features=5120, bias=False)
  )
  (language_model): MistralForCausalLM(
    (model): MistralModel(
      (embed_tokens): Embedding(131072, 5120)
      (layers): ModuleList(
        (0-39): 40 x MistralDecoderLayer(
          (self_attn): MistralAttention(
            (q_proj): Linear(in_features=5120, out_features=4096, bias=False)
            (k_proj): Linear(in_features=5120, out_features=1024, bias=False)
            (v_proj): Linear(in_features=5120, out_features=1024, bias=False)
            (o_proj): Linear(in_features=4096, out_features=5120, bias=False)
          )
          (mlp): MistralMLP(
            (gate_proj): Linear(in_features=5120, out_features=32768, bias=False)
            (up_proj): Linear(in_features=5120, out_features=32768, bias=False)
            (down_proj): Linear(in_features=32768, out_features=5120, bias=False)
            (act_fn): SiLU()
          )
          (input_layernorm): MistralRMSNorm((5120,), eps=1e-05)
          (post_attention_layernorm): MistralRMSNorm((5120,), eps=1e-05)
        )
      )
      (norm): MistralRMSNorm((5120,), eps=1e-05)
      (rotary_emb): MistralRotaryEmbedding()
    )
    (lm_head): Linear(in_features=5120, out_features=131072, bias=False)
  )
)
@VAmblardPEReN VAmblardPEReN added the bug Something isn't working label Apr 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant