[BUG] Loss in Accuracy with Paged=False with Qwen2.5_VL Vision Models on Linux #753

RaahimSiddiqi · 2025-03-18T03:19:06Z

OS

Linux

GPU Library

CUDA 12.x

Python version

3.12

Pytorch version

2.5.0

Model

https://huggingface.co/turboderp/Qwen2-VL-7B-Instruct-exl2/tree/6.0bpw

Describe the bug

Trying to run exllamav2 with the Qwen2-VL-7B-Instruct-exl2-q6 model with the paged=False flag results in highly differing answers, with the answer being inaccurate (mostly due to incompleteness) on Paged=False.

This happens specifically when performing Video Analysis (list of PIL Images). I have not explicitly tested single image inference.

With Paged=True

The video showcases a series of screen captures from a software development environment, likely a code editor or an integrated development environment (IDE). The content includes:

1. **API Documentation**: The first part of the video displays detailed API documentation for Azure Video Indexer, focusing on operations such as "Get Video Summary" and "Create Video Summary." The documentation includes request parameters, response formats, and examples of HTTP requests and responses.

2. **Code Editor**: The second part of the video transitions to a code editor where a developer is working on a project. The code appears to be related to Azure Video Indexer, as it references the API documentation seen earlier. The developer is interacting with the code, possibly testing or implementing functionalities related to video indexing and summarization.

3. **Performance Monitoring**: The third part of the video shows a performance monitoring tool, likely within the IDE, displaying CPU usage, memory usage, and other performance metrics. This suggests that the developer is monitoring the performance of their application or code.

The video seems to be a tutorial or a demonstration of how to use Azure Video Indexer APIs within a development environment, focusing on both the API documentation and the implementation of those APIs in code

With Paged=False:

The image is a screenshot of a computer screen displaying a code editor window with a code file open.

Same code, same prompt.

Prompt: "Describe the video concisely"

Reproduction steps

Code is identical to that which can be found here:

https://github.com/turboderp-org/exllamav2/blob/master/examples/multimodal_video.py

Expected behavior

Same answer (or highly similar) on paged=True and paged=False.

Logs

No response

Additional context

No response

Acknowledgements

I have looked for similar issues before submitting this one.
I understand that the developers have lives and my issue will be answered when possible.
I understand the developers of this program are human, and I will ask my questions politely.

The text was updated successfully, but these errors were encountered:

RaahimSiddiqi added the bug Something isn't working label Mar 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Loss in Accuracy with Paged=False with Qwen2.5_VL Vision Models on Linux #753

[BUG] Loss in Accuracy with Paged=False with Qwen2.5_VL Vision Models on Linux #753

RaahimSiddiqi commented Mar 18, 2025

[BUG] Loss in Accuracy with Paged=False with Qwen2.5_VL Vision Models on Linux #753

[BUG] Loss in Accuracy with Paged=False with Qwen2.5_VL Vision Models on Linux #753

Comments

RaahimSiddiqi commented Mar 18, 2025

OS

GPU Library

Python version

Pytorch version

Model

Describe the bug

With Paged=True

With Paged=False:

Reproduction steps

Expected behavior

Logs

Additional context

Acknowledgements