Skip to content

Commit b6cce35

Browse files
authored
Add MultimodalQnA as MMRAG usecase in Example (#751)
Signed-off-by: Tiep Le <tiep.le@intel.com> Signed-off-by: siddhivelankar23 <siddhi.velankar@intel.com> Signed-off-by: sjagtap1803 <siddhant.jagtap@intel.com>
1 parent 06696c8 commit b6cce35

21 files changed

+2558
-0
lines changed

MultimodalQnA/Dockerfile

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# Copyright (C) 2024 Intel Corporation
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
FROM python:3.11-slim
5+
6+
RUN apt-get update -y && apt-get install -y --no-install-recommends --fix-missing \
7+
libgl1-mesa-glx \
8+
libjemalloc-dev \
9+
git
10+
11+
RUN useradd -m -s /bin/bash user && \
12+
mkdir -p /home/user && \
13+
chown -R user /home/user/
14+
15+
WORKDIR /home/user/
16+
RUN git clone https://github.com/opea-project/GenAIComps.git
17+
18+
WORKDIR /home/user/GenAIComps
19+
RUN pip install --no-cache-dir --upgrade pip && \
20+
pip install --no-cache-dir -r /home/user/GenAIComps/requirements.txt
21+
22+
COPY ./multimodalqna.py /home/user/multimodalqna.py
23+
24+
ENV PYTHONPATH=$PYTHONPATH:/home/user/GenAIComps
25+
26+
USER user
27+
28+
WORKDIR /home/user
29+
30+
ENTRYPOINT ["python", "multimodalqna.py"]
31+
# ENTRYPOINT ["/usr/bin/sleep", "infinity"]

MultimodalQnA/README.md

Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
# MultimodalQnA Application
2+
3+
Suppose you possess a set of videos and wish to perform question-answering to extract insights from these videos. To respond to your questions, it typically necessitates comprehension of visual cues within the videos, knowledge derived from the audio content, or often a mix of both these visual elements and auditory facts. The MultimodalQnA framework offers an optimal solution for this purpose.
4+
5+
`MultimodalQnA` addresses your questions by dynamically fetching the most pertinent multimodal information (frames, transcripts, and/or captions) from your collection of videos. For this purpose, MultimodalQnA utilizes [BridgeTower model](https://huggingface.co/BridgeTower/bridgetower-large-itm-mlm-gaudi), a multimodal encoding transformer model which merges visual and textual data into a unified semantic space. During the video ingestion phase, the BridgeTower model embeds both visual cues and auditory facts as texts, and those embeddings are then stored in a vector database. When it comes to answering a question, the MultimodalQnA will fetch its most relevant multimodal content from the vector store and feed it into a downstream Large Vision-Language Model (LVM) as input context to generate a response for the user.
6+
7+
The MultimodalQnA architecture shows below:
8+
9+
![architecture](./assets/img/MultimodalQnA.png)
10+
11+
MultimodalQnA is implemented on top of [GenAIComps](https://github.com/opea-project/GenAIComps), the MultimodalQnA Flow Chart shows below:
12+
13+
```mermaid
14+
---
15+
config:
16+
flowchart:
17+
nodeSpacing: 100
18+
rankSpacing: 100
19+
curve: linear
20+
theme: base
21+
themeVariables:
22+
fontSize: 42px
23+
---
24+
flowchart LR
25+
%% Colors %%
26+
classDef blue fill:#ADD8E6,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
27+
classDef orange fill:#FBAA60,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
28+
classDef orchid fill:#C26DBC,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
29+
classDef invisible fill:transparent,stroke:transparent;
30+
style MultimodalQnA-MegaService stroke:#000000
31+
%% Subgraphs %%
32+
subgraph MultimodalQnA-MegaService["MultimodalQnA-MegaService"]
33+
direction LR
34+
EM([Embedding <br>]):::blue
35+
RET([Retrieval <br>]):::blue
36+
LVM([LVM <br>]):::blue
37+
end
38+
subgraph User Interface
39+
direction TB
40+
a([User Input Query]):::orchid
41+
Ingest([Ingest data]):::orchid
42+
UI([UI server<br>]):::orchid
43+
end
44+
subgraph MultimodalQnA GateWay
45+
direction LR
46+
invisible1[ ]:::invisible
47+
GW([MultimodalQnA GateWay<br>]):::orange
48+
end
49+
subgraph .
50+
X([OPEA Microservice]):::blue
51+
Y{{Open Source Service}}
52+
Z([OPEA Gateway]):::orange
53+
Z1([UI]):::orchid
54+
end
55+
56+
TEI_EM{{Embedding service <br>}}
57+
VDB{{Vector DB<br><br>}}
58+
R_RET{{Retriever service <br>}}
59+
DP([Data Preparation<br>]):::blue
60+
LVM_gen{{LVM Service <br>}}
61+
62+
%% Data Preparation flow
63+
%% Ingest data flow
64+
direction LR
65+
Ingest[Ingest data] -->|a| UI
66+
UI -->|b| DP
67+
DP <-.->|c| TEI_EM
68+
69+
%% Questions interaction
70+
direction LR
71+
a[User Input Query] -->|1| UI
72+
UI -->|2| GW
73+
GW <==>|3| MultimodalQnA-MegaService
74+
EM ==>|4| RET
75+
RET ==>|5| LVM
76+
77+
78+
%% Embedding service flow
79+
direction TB
80+
EM <-.->|3'| TEI_EM
81+
RET <-.->|4'| R_RET
82+
LVM <-.->|5'| LVM_gen
83+
84+
direction TB
85+
%% Vector DB interaction
86+
R_RET <-.->|d|VDB
87+
DP <-.->|e|VDB
88+
89+
90+
91+
92+
```
93+
94+
This MultimodalQnA use case performs Multimodal-RAG using LangChain, Redis VectorDB and Text Generation Inference on Intel Gaudi2 or Intel Xeon Scalable Processors. The Intel Gaudi2 accelerator supports both training and inference for deep learning models in particular for LLMs. Visit [Habana AI products](https://habana.ai/products) for more details.
95+
96+
In the below, we provide a table that describes for each microservice component in the MultimodalQnA architecture, the default configuration of the open source project, hardware, port, and endpoint.
97+
98+
<details>
99+
<summary><b>Gaudi default compose.yaml</b></summary>
100+
101+
| MicroService | Open Source Project | HW | Port | Endpoint |
102+
| ------------ | --------------------- | ----- | ---- | ----------------------------------------------- |
103+
| Embedding | Langchain | Xeon | 6000 | /v1/embeddings |
104+
| Retriever | Langchain, Redis | Xeon | 7000 | /v1/multimodal_retrieval |
105+
| LVM | Langchain, TGI | Gaudi | 9399 | /v1/lvm |
106+
| Dataprep | Redis, Langchain, TGI | Gaudi | 6007 | /v1/generate_transcripts, /v1/generate_captions |
107+
108+
</details>
109+
110+
## Required Models
111+
112+
By default, the embedding and LVM models are set to a default value as listed below:
113+
114+
| Service | Model |
115+
| -------------------- | ------------------------------------------- |
116+
| embedding-multimodal | BridgeTower/bridgetower-large-itm-mlm-gaudi |
117+
| LVM | llava-hf/llava-v1.6-vicuna-13b-hf |
118+
119+
You can choose other LVM models, such as `llava-hf/llava-1.5-7b-hf ` and `llava-hf/llava-1.5-13b-hf`, as needed.
120+
121+
## Deploy MultimodalQnA Service
122+
123+
The MultimodalQnA service can be effortlessly deployed on either Intel Gaudi2 or Intel XEON Scalable Processors.
124+
125+
Currently we support deploying MultimodalQnA services with docker compose.
126+
127+
### Setup Environment Variable
128+
129+
To set up environment variables for deploying MultimodalQnA services, follow these steps:
130+
131+
1. Set the required environment variables:
132+
133+
```bash
134+
# Example: export host_ip=$(hostname -I | awk '{print $1}')
135+
export host_ip="External_Public_IP"
136+
# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
137+
export no_proxy="Your_No_Proxy"
138+
```
139+
140+
2. If you are in a proxy environment, also set the proxy-related environment variables:
141+
142+
```bash
143+
export http_proxy="Your_HTTP_Proxy"
144+
export https_proxy="Your_HTTPs_Proxy"
145+
```
146+
147+
3. Set up other environment variables:
148+
149+
> Notice that you can only choose **one** command below to set up envs according to your hardware. Other that the port numbers may be set incorrectly.
150+
151+
```bash
152+
# on Gaudi
153+
source ./docker_compose/intel/hpu/gaudi/set_env.sh
154+
# on Xeon
155+
source ./docker_compose/intel/cpu/xeon/set_env.sh
156+
```
157+
158+
### Deploy MultimodalQnA on Gaudi
159+
160+
Refer to the [Gaudi Guide](./docker_compose/intel/hpu/gaudi/README.md) to build docker images from source.
161+
162+
Find the corresponding [compose.yaml](./docker_compose/intel/hpu/gaudi/compose.yaml).
163+
164+
```bash
165+
cd GenAIExamples/MultimodalQnA/docker_compose/intel/hpu/gaudi/
166+
docker compose -f compose.yaml up -d
167+
```
168+
169+
> Notice: Currently only the **Habana Driver 1.17.x** is supported for Gaudi.
170+
171+
### Deploy MultimodalQnA on Xeon
172+
173+
Refer to the [Xeon Guide](./docker_compose/intel/cpu/xeon/README.md) for more instructions on building docker images from source.
174+
175+
Find the corresponding [compose.yaml](./docker_compose/intel/cpu/xeon/compose.yaml).
176+
177+
```bash
178+
cd GenAIExamples/MultimodalQnA/docker_compose/intel/cpu/xeon/
179+
docker compose -f compose.yaml up -d
180+
```
181+
182+
## MultimodalQnA Demo on Gaudi2
183+
184+
![MultimodalQnA-upload-waiting-screenshot](./assets/img/upload-gen-trans.png)
185+
186+
![MultimodalQnA-upload-done-screenshot](./assets/img/upload-gen-captions.png)
187+
188+
![MultimodalQnA-query-example-screenshot](./assets/img/example_query.png)
62 KB
Loading
404 KB
Loading
48.4 KB
Loading
395 KB
Loading

0 commit comments

Comments
 (0)