You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Trying to run exllamav2 with the Qwen2-VL-7B-Instruct-exl2-q6 model with the paged=False flag results in the following error somewhere in the attention code when creating video embeddings.
File "C:\Users\Raahim.Siddiqi\AppData\Local\miniconda3\envs\VisualSummarizerExllama\Lib\site-packages\exllamav2\attn_params.py", line 197, in get_block_diag_mask
self.block_diag_mask = labels.unsqueeze(0) == labels.unsqueeze(1).repeat(self.batch_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Number of dimensions of repeat dims can not be smaller than number of dimensions of tensor
Full Traceback:
Traceback (most recent call last):
File "C:\TFS\VIDIZMO\SOURCE\VisualGenerativeAIPY\exllama-vision.py", line 136, in <module>
video_embedding = vision_model.get_video_embeddings(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Raahim.Siddiqi\AppData\Local\miniconda3\envs\VisualSummarizerExllama\Lib\site-packages\exllamav2\vlm\vision_tower.py", line 396, in get_video_embeddings
embedding_tensor = self.process(
^^^^^^^^^^^^^
File "C:\Users\Raahim.Siddiqi\AppData\Local\miniconda3\envs\VisualSummarizerExllama\Lib\site-packages\exllamav2\vlm\vision_tower.py", line 244, in process
hidden_states = module.forward(
^^^^^^^^^^^^^^^
File "C:\Users\Raahim.Siddiqi\AppData\Local\miniconda3\envs\VisualSummarizerExllama\Lib\site-packages\exllamav2\attn.py", line 1059, in forward
return self.forward_torch(
^^^^^^^^^^^^^^^^^^^
File "C:\Users\Raahim.Siddiqi\AppData\Local\miniconda3\envs\VisualSummarizerExllama\Lib\site-packages\exllamav2\attn.py", line 1483, in forward_torch
attn_output = attn_func(
^^^^^^^^^^
File "C:\Users\Raahim.Siddiqi\AppData\Local\miniconda3\envs\VisualSummarizerExllama\Lib\site-packages\exllamav2\attn.py", line 886, in _attn_torch
attn_mask_lr = attn_params.get_block_diag_mask(q_states.device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Raahim.Siddiqi\AppData\Local\miniconda3\envs\VisualSummarizerExllama\Lib\site-packages\exllamav2\attn_params.py", line 197, in get_block_diag_mask
self.block_diag_mask = labels.unsqueeze(0) == labels.unsqueeze(1).repeat(self.batch_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Number of dimensions of repeat dims can not be smaller than number of dimensions of tensor
ExllamaV2 Version:
0.2.8+cu121.torch2.5.0
GPU: RTX 3090
Nvidia Driver: 566.14
Reproduction steps
To ensure that there was no error on my part, I tested the code example available in the github repo.
Only change I made to that code was the location of the image files. It can reproduced by running this code.
Expected behavior
I have tested the same code with Linux (WSL) so I know the code is fine. I expect it to describe the video I'm giving to it (in the form a list of PIL images).
Logs
No response
Additional context
No response
Acknowledgements
I have looked for similar issues before submitting this one.
I understand that the developers have lives and my issue will be answered when possible.
I understand the developers of this program are human, and I will ask my questions politely.
The text was updated successfully, but these errors were encountered:
OS
Windows
GPU Library
CUDA 12.x
Python version
3.12
Pytorch version
2.5.0
Model
https://huggingface.co/turboderp/Qwen2-VL-7B-Instruct-exl2/tree/6.0bpw
Describe the bug
Trying to run exllamav2 with the Qwen2-VL-7B-Instruct-exl2-q6 model with the paged=False flag results in the following error somewhere in the attention code when creating video embeddings.
Full Traceback:
ExllamaV2 Version:
0.2.8+cu121.torch2.5.0
GPU: RTX 3090
Nvidia Driver: 566.14
Reproduction steps
To ensure that there was no error on my part, I tested the code example available in the github repo.
https://github.com/turboderp-org/exllamav2/blob/master/examples/multimodal_video.py
Only change I made to that code was the location of the image files. It can reproduced by running this code.
Expected behavior
I have tested the same code with Linux (WSL) so I know the code is fine. I expect it to describe the video I'm giving to it (in the form a list of PIL images).
Logs
No response
Additional context
No response
Acknowledgements
The text was updated successfully, but these errors were encountered: