Skip to content

Commit 1e5785e

Browse files
authoredNov 24, 2024
Merge pull request #40 from remichu-ai/transformer_multimodal
Qwen 2 VL
2 parents d5be6e0 + 872dabd commit 1e5785e

File tree

5 files changed

+113
-89
lines changed

5 files changed

+113
-89
lines changed
 

‎README.md

+15-14
Original file line numberDiff line numberDiff line change
@@ -24,9 +24,10 @@ Do checkout [TabbyAPI](https://github.com/theroyallab/tabbyAPI) if you want a re
2424

2525
# NEW - Vision Model
2626

27-
From `gallama` version 0.0.7, there is experimental support for Vision model.
27+
As of v0.0.8post1, Qwen 2 VL (Image only, no Video) and Pixtral are supported via Exllama (>=0.2.4).
2828

29-
Currently, as of v0.0.8, Pixtral is supported via Exllama (>=0.2.4) and Qwen 2 VL series of model is supported via transformers.
29+
For Pixtral, please install Exllama V2 `v0.2.4` onwards
30+
For Exllama V2, please install `dev` branch of Exllama V2 as the code is not yet merged to `v0.2.4`.
3031

3132
After Exllama roll out support for Qwen 2 VL, running model via transformers will be depreciated.
3233
Currently, both exllamaV2 and llama.cpp do not support Vision model yet. Hence, this is achieved by running `transformers` with the use of awq for quantization.
@@ -49,16 +50,16 @@ This is already be handled in the requirements.txt, however, getting transformer
4950
After installation you can download by following commands (choose a version that fit your VRAM):
5051
```shell
5152
# 2B model
52-
gallama download qwen-2-VL-2B:4.0 --backend=transformers
53-
gallama run qwen-2-VL-2B_transformers
53+
gallama download qwen-2-VL-2B:4.0
54+
gallama run qwen-2-VL-2B
5455

5556
# 7B model
56-
gallama download qwen-2-VL-7B:4.0 --backend=transformers
57-
gallama run qwen-2-VL-7B_transformers
57+
gallama download qwen-2-VL-7B:4.0
58+
gallama run qwen-2-VL-7B
5859

5960
# 72B model
60-
gallama download qwen-2-VL-72B:4.0 --backend=transformers
61-
gallama run qwen-2-VL-72B_transformers
61+
gallama download qwen-2-VL-72B:4.0
62+
gallama run qwen-2-VL-72B
6263
```
6364

6465
If you need an UI to run it, check out Gallama UI, it is working with images, however, the support is not perfect at the moment:
@@ -131,12 +132,12 @@ gallama list available
131132

132133
**Vision Large Language Models**
133134

134-
| Model | Backend | Available Quantizations (bpw) |
135-
|---------------|--------------|----------------------------------------------------------------------------------------|
136-
| qwen-2-VL-2B | transformers | `4.0`, `16.0` |
137-
| qwen-2-VL-7B | transformers | `4.0`, `16.0` |
138-
| qwen-2-VL-72B | transformers | `4.0`, `16.0` |
139-
| pixtral | exllama | `2.5`, `3.0`, `3.5`, `4.0`, `4.5`, `5.0`, `6.0`, `8.0` |
135+
| Model | Backend | Available Quantizations (bpw) |
136+
|---------------|--------------|--------------------------------------------------------|
137+
| qwen-2-VL-2B | exllama | `3.0`, `3.5`, `4.0`, `4.5` ,`5.0`, `6.0`, `8.0` |
138+
| qwen-2-VL-7B | exllama | `3.0`, `3.5`, `4.0`, `4.5` ,`5.0`, `6.0`, `8.0` |
139+
| qwen-2-VL-72B | exllama | `3.0`, `3.5`, `4.0`, `4.5` ,`5.0`, `6.0`, `8.0` |
140+
| pixtral | exllama | `2.5`, `3.0`, `3.5`, `4.0`, `4.5`, `5.0`, `6.0`, `8.0` |
140141

141142

142143
**Embedding Models:**

‎pyproject.toml

+1-1
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
44

55
[project]
66
name = "gallama"
7-
version = "0.0.8"
7+
version = "0.0.8post1"
88
description = "An opinionated Llama Server engine with a focus on agentic tasks"
99
authors = [{name = "David", email = "trantrungduc91@example.com"}]
1010
license = {text = "MIT"}

‎src/gallama/backend/chatgenerator.py

+35-25
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@
2929
from qwen_vl_utils import process_vision_info
3030
from .model_support.llama3_2_vision.text_streamer import CustomTextIteratorStreamer
3131
from ..utils.utils import get_image
32+
from functools import lru_cache
3233

3334
try:
3435
from formatron.formatter import FormatterBuilder
@@ -597,6 +598,19 @@ def get_stop_word(text, stop_words) -> Union[str, None]:
597598

598599
return None
599600

601+
@staticmethod
602+
@lru_cache(128) # TODO set this dynamically
603+
def get_image_embedding_cached(processor, model, tokenizer, url):
604+
img = get_image(url=url)
605+
606+
return processor.get_image_embeddings(
607+
model=model,
608+
tokenizer=tokenizer,
609+
image=img,
610+
text_alias=None, # passing None will let me model generate its one embedding
611+
)
612+
613+
600614
async def generate(
601615
self,
602616
prompt: str,
@@ -617,24 +631,6 @@ async def generate(
617631
) -> (str, GenerationStats):
618632
try:
619633

620-
def extract_uuid_strings(text):
621-
"""
622-
Extract all strings matching the format '{{IMG-<uuid-like-hex>}}'
623-
624-
Args:
625-
text (str): Input string to search for matching patterns
626-
627-
Returns:
628-
list: List of all matching strings found in the input text
629-
"""
630-
# Pattern to match strings like '{{IMG-<uuid-hex>}}'
631-
pattern = r'\{\{IMG-[0-9a-f]{32}\}\}'
632-
633-
# Find all matching occurrences in the text
634-
matches = re.findall(pattern, text)
635-
636-
return matches
637-
638634
# ensure that generator is initialized
639635
if self.pipeline is None:
640636
self.pipeline = await self._get_pipeline_async()
@@ -679,21 +675,35 @@ def extract_uuid_strings(text):
679675

680676
image_embeddings = None
681677
if vision_required and self.processor:
682-
image_token_list = extract_uuid_strings(prompt) # extract all the placeholder token used for img placeholder
678+
# count the number of image placeholder token
679+
image_token = "{{IMG-PlaceHolderTokenHere}}" # TODO move to a constant
680+
image_token_count = prompt.count(image_token)
683681

684-
assert len(image_token_list) == len(
685-
image_list), f"Mismatch in image tokens and images: {len(image_token_list)} tokens vs {len(image_list)} images"
682+
# raise error if the img token count and image to embed not match
683+
assert image_token_count == len(
684+
image_list), f"Mismatch in image tokens and images: {image_token_count} tokens vs {len(image_list)} images"
686685

687686
# Convert image(s) to embeddings
687+
688688
image_embeddings = [
689-
self.processor.get_image_embeddings(
689+
self.get_image_embedding_cached(
690+
processor=self.processor,
690691
model=self.model,
691692
tokenizer=self.tokenizer,
692-
image=img,
693-
text_alias=alias,
693+
url=url,
694694
)
695-
for (alias, img) in zip(image_token_list, [get_image(url=url) for url in image_list])
695+
696+
for url in image_list
696697
]
698+
# logger.info(self.get_image_embedding_cached.cache_info())
699+
700+
# replace embedding
701+
for emb in image_embeddings:
702+
prompt = prompt.replace(image_token, emb.text_alias, 1) # replace one token with 1 embedding sequentially
703+
# logger.info(emb.text_alias)
704+
705+
# logger.info(prompt)
706+
697707
elif vision_required and not self.processor:
698708
if version('exllamav2') < '0.2.4':
699709
raise Exception(f"Current Exllama version of {version('exllamav2')} do not support Vision model")

‎src/gallama/backend/prompt_engine.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -274,7 +274,8 @@ def convert_multimodal_content_list_to_string(
274274
content_str += self.get_vision_start_token() + self.get_image_pad_token() + self.get_vision_end_token() # TODO
275275
else:
276276
# use a standard token as place holder, TODO - refractor
277-
content_str += "{{IMG-" + f"{uuid.uuid4().hex}" + "}}"
277+
# content_str += "{{IMG-" + f"{uuid.uuid4().hex}" + "}}"
278+
content_str += "{{IMG-PlaceHolderTokenHere}}" #TODO use a constant instead
278279
else:
279280
raise ValueError("Unexpected content type ")
280281

‎src/gallama/data/default_model_list.yaml

+60-48
Original file line numberDiff line numberDiff line change
@@ -633,64 +633,76 @@ qwen-2-VL-2B:
633633
default_cache_quant: Q4
634634
prompt_template: Qwen2-VL
635635
repo:
636-
- repo: "Qwen/Qwen2-VL-2B-Instruct-AWQ"
637-
branch: ['main']
638-
quant: [4.0]
639-
backend: transformers
640-
transformers_args:
641-
model_class: "transformers.Qwen2VLForConditionalGeneration"
642-
tokenizer_class: "transformers.AutoTokenizer"
643-
processor_class: "transformers.AutoProcessor"
644-
- repo: "Qwen/Qwen2-VL-2B-Instruct"
645-
branch: ['main']
646-
quant: [16.0]
647-
backend: transformers
648-
transformers_args:
649-
model_class: "transformers.Qwen2VLForConditionalGeneration"
650-
tokenizer_class: "transformers.AutoTokenizer"
651-
processor_class: "transformers.AutoProcessor"
636+
- repo: "turboderp/Qwen2-VL-2B-Instruct-exl2"
637+
branch: ['3.0bpw', '3.5bpw', '4.0bpw', '4.5bpw', '5.0bpw', '6.0bpw', '8.0bpw']
638+
quant: [3.0, 3.5, 4.0, 4.5, 5.0, 6.0, 8.0]
639+
backend: exllama
640+
# - repo: "Qwen/Qwen2-VL-2B-Instruct-AWQ"
641+
# branch: ['main']
642+
# quant: [4.0]
643+
# backend: transformers
644+
# transformers_args:
645+
# model_class: "transformers.Qwen2VLForConditionalGeneration"
646+
# tokenizer_class: "transformers.AutoTokenizer"
647+
# processor_class: "transformers.AutoProcessor"
648+
# - repo: "Qwen/Qwen2-VL-2B-Instruct"
649+
# branch: ['main']
650+
# quant: [16.0]
651+
# backend: transformers
652+
# transformers_args:
653+
# model_class: "transformers.Qwen2VLForConditionalGeneration"
654+
# tokenizer_class: "transformers.AutoTokenizer"
655+
# processor_class: "transformers.AutoProcessor"
652656
qwen-2-VL-7B:
653657
default_quant: 4.0
654658
default_cache_quant: Q4
655659
prompt_template: Qwen2-VL
656660
repo:
657-
- repo: "Qwen/Qwen2-VL-7B-Instruct-AWQ"
658-
branch: ['main']
659-
quant: [4.0]
660-
backend: transformers
661-
transformers_args:
662-
model_class: "transformers.Qwen2VLForConditionalGeneration"
663-
tokenizer_class: "transformers.AutoTokenizer"
664-
processor_class: "transformers.AutoProcessor"
665-
- repo: "Qwen/Qwen2-VL-7B-Instruct"
666-
branch: ['main']
667-
quant: [16.0]
668-
backend: transformers
669-
transformers_args:
670-
model_class: "transformers.Qwen2VLForConditionalGeneration"
671-
tokenizer_class: "transformers.AutoTokenizer"
672-
processor_class: "transformers.AutoProcessor"
661+
- repo: "turboderp/Qwen2-VL-7B-Instruct-exl2"
662+
branch: ['3.0bpw', '3.5bpw', '4.0bpw', '4.5bpw', '5.0bpw', '6.0bpw', '8.0bpw']
663+
quant: [3.0, 3.5, 4.0, 4.5, 5.0, 6.0, 8.0]
664+
backend: exllama
665+
# - repo: "Qwen/Qwen2-VL-7B-Instruct-AWQ"
666+
# branch: ['main']
667+
# quant: [4.0]
668+
# backend: transformers
669+
# transformers_args:
670+
# model_class: "transformers.Qwen2VLForConditionalGeneration"
671+
# tokenizer_class: "transformers.AutoTokenizer"
672+
# processor_class: "transformers.AutoProcessor"
673+
# - repo: "Qwen/Qwen2-VL-7B-Instruct"
674+
# branch: ['main']
675+
# quant: [16.0]
676+
# backend: transformers
677+
# transformers_args:
678+
# model_class: "transformers.Qwen2VLForConditionalGeneration"
679+
# tokenizer_class: "transformers.AutoTokenizer"
680+
# processor_class: "transformers.AutoProcessor"
673681
qwen-2-VL-72B:
674682
default_quant: 4.0
675683
default_cache_quant: Q4
676684
prompt_template: Qwen2-VL
677685
repo:
678-
- repo: "Qwen/Qwen2-VL-72B-Instruct-AWQ"
679-
branch: ['main']
680-
quant: [4.0]
681-
backend: transformers
682-
transformers_args:
683-
model_class: "transformers.Qwen2VLForConditionalGeneration"
684-
tokenizer_class: "transformers.AutoTokenizer"
685-
processor_class: "transformers.AutoProcessor"
686-
- repo: "Qwen/Qwen2-VL-72B-Instruct-AWQ"
687-
branch: ['main']
688-
quant: [16.0]
689-
backend: transformers
690-
transformers_args:
691-
model_class: "transformers.Qwen2VLForConditionalGeneration"
692-
tokenizer_class: "transformers.AutoTokenizer"
693-
processor_class: "transformers.AutoProcessor"
686+
- repo: "turboderp/Qwen2-VL-7B-Instruct-exl2"
687+
branch: ['3.0bpw', '3.5bpw', '4.0bpw', '4.5bpw', '5.0bpw', '6.0bpw', '8.0bpw']
688+
quant: [3.0, 3.5, 4.0, 4.5, 5.0, 6.0, 8.0]
689+
backend: exllama
690+
# - repo: "Qwen/Qwen2-VL-72B-Instruct-AWQ"
691+
# branch: ['main']
692+
# quant: [4.0]
693+
# backend: transformers
694+
# transformers_args:
695+
# model_class: "transformers.Qwen2VLForConditionalGeneration"
696+
# tokenizer_class: "transformers.AutoTokenizer"
697+
# processor_class: "transformers.AutoProcessor"
698+
# - repo: "Qwen/Qwen2-VL-72B-Instruct-AWQ"
699+
# branch: ['main']
700+
# quant: [16.0]
701+
# backend: transformers
702+
# transformers_args:
703+
# model_class: "transformers.Qwen2VLForConditionalGeneration"
704+
# tokenizer_class: "transformers.AutoTokenizer"
705+
# processor_class: "transformers.AutoProcessor"
694706

695707
# qwen 2.5 Coder
696708
qwen-2.5-Coder-32B:

0 commit comments

Comments
 (0)
Failed to load comments.