[BUG] Bug in attention mechanism when Paged=False for Qwen2.5_VL Models #752

RaahimSiddiqi · 2025-03-18T03:02:21Z

OS

Windows

GPU Library

CUDA 12.x

Python version

3.12

Pytorch version

2.5.0

Model

https://huggingface.co/turboderp/Qwen2-VL-7B-Instruct-exl2/tree/6.0bpw

Describe the bug

Trying to run exllamav2 with the Qwen2-VL-7B-Instruct-exl2-q6 model with the paged=False flag results in the following error somewhere in the attention code when creating video embeddings.

  File "C:\Users\Raahim.Siddiqi\AppData\Local\miniconda3\envs\VisualSummarizerExllama\Lib\site-packages\exllamav2\attn_params.py", line 197, in get_block_diag_mask
    self.block_diag_mask = labels.unsqueeze(0) == labels.unsqueeze(1).repeat(self.batch_size)
                                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Number of dimensions of repeat dims can not be smaller than number of dimensions of tensor

Full Traceback:

Traceback (most recent call last):
  File "C:\TFS\VIDIZMO\SOURCE\VisualGenerativeAIPY\exllama-vision.py", line 136, in <module>
    video_embedding = vision_model.get_video_embeddings(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Raahim.Siddiqi\AppData\Local\miniconda3\envs\VisualSummarizerExllama\Lib\site-packages\exllamav2\vlm\vision_tower.py", line 396, in get_video_embeddings
    embedding_tensor = self.process(
                       ^^^^^^^^^^^^^
  File "C:\Users\Raahim.Siddiqi\AppData\Local\miniconda3\envs\VisualSummarizerExllama\Lib\site-packages\exllamav2\vlm\vision_tower.py", line 244, in process
    hidden_states = module.forward(
                    ^^^^^^^^^^^^^^^
  File "C:\Users\Raahim.Siddiqi\AppData\Local\miniconda3\envs\VisualSummarizerExllama\Lib\site-packages\exllamav2\attn.py", line 1059, in forward
    return self.forward_torch(
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Raahim.Siddiqi\AppData\Local\miniconda3\envs\VisualSummarizerExllama\Lib\site-packages\exllamav2\attn.py", line 1483, in forward_torch
    attn_output = attn_func(
                  ^^^^^^^^^^
  File "C:\Users\Raahim.Siddiqi\AppData\Local\miniconda3\envs\VisualSummarizerExllama\Lib\site-packages\exllamav2\attn.py", line 886, in _attn_torch
    attn_mask_lr = attn_params.get_block_diag_mask(q_states.device)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Raahim.Siddiqi\AppData\Local\miniconda3\envs\VisualSummarizerExllama\Lib\site-packages\exllamav2\attn_params.py", line 197, in get_block_diag_mask
    self.block_diag_mask = labels.unsqueeze(0) == labels.unsqueeze(1).repeat(self.batch_size)
                                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Number of dimensions of repeat dims can not be smaller than number of dimensions of tensor

ExllamaV2 Version:
0.2.8+cu121.torch2.5.0

GPU: RTX 3090

Nvidia Driver: 566.14

Reproduction steps

To ensure that there was no error on my part, I tested the code example available in the github repo.

https://github.com/turboderp-org/exllamav2/blob/master/examples/multimodal_video.py

Only change I made to that code was the location of the image files. It can reproduced by running this code.

Expected behavior

I have tested the same code with Linux (WSL) so I know the code is fine. I expect it to describe the video I'm giving to it (in the form a list of PIL images).

Logs

No response

Additional context

No response

Acknowledgements

I have looked for similar issues before submitting this one.
I understand that the developers have lives and my issue will be answered when possible.
I understand the developers of this program are human, and I will ask my questions politely.

The text was updated successfully, but these errors were encountered:

RaahimSiddiqi · 2025-03-19T02:11:18Z

@turboderp Any chance this is the fix for this issue?

2e630ae

RaahimSiddiqi added the bug Something isn't working label Mar 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Bug in attention mechanism when Paged=False for Qwen2.5_VL Models #752

[BUG] Bug in attention mechanism when Paged=False for Qwen2.5_VL Models #752

RaahimSiddiqi commented Mar 18, 2025

RaahimSiddiqi commented Mar 19, 2025

[BUG] Bug in attention mechanism when Paged=False for Qwen2.5_VL Models #752

[BUG] Bug in attention mechanism when Paged=False for Qwen2.5_VL Models #752

Comments

RaahimSiddiqi commented Mar 18, 2025

OS

GPU Library

Python version

Pytorch version

Model

Describe the bug

Reproduction steps

Expected behavior

Logs

Additional context

Acknowledgements

RaahimSiddiqi commented Mar 19, 2025