Skip to content

Commit f11ab45

Browse files
MultimodalQnA image query, pdf, dynamic ports, and UI updates (#1381)
Per the proposed changes in this [RFC](https://github.com/opea-project/docs/blob/main/community/rfcs/24-10-02-GenAIExamples-001-Image_and_Audio_Support_in_MultimodalQnA.md)'s Phase 2 plan, this PR adds support for image queries, PDF ingestion and display, and dynamic ports. There are also some bug fixes. This PR goes with [this one in GenAIComps](opea-project/GenAIComps#1134). Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com> Co-authored-by: Liang Lv <liang1.lv@intel.com>
1 parent f3562be commit f11ab45

26 files changed

+802
-289
lines changed

MultimodalQnA/Dockerfile

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,14 +16,12 @@ RUN useradd -m -s /bin/bash user && \
1616

1717
WORKDIR $HOME
1818

19-
2019
# Stage 2: latest GenAIComps sources
2120
FROM base AS git
2221

2322
RUN apt-get update && apt-get install -y --no-install-recommends git
2423
RUN git clone --depth 1 https://github.com/opea-project/GenAIComps.git
2524

26-
2725
# Stage 3: common layer shared by services using GenAIComps
2826
FROM base AS comps-base
2927

MultimodalQnA/README.md

Lines changed: 41 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
# MultimodalQnA Application
22

3-
Suppose you possess a set of videos and wish to perform question-answering to extract insights from these videos. To respond to your questions, it typically necessitates comprehension of visual cues within the videos, knowledge derived from the audio content, or often a mix of both these visual elements and auditory facts. The MultimodalQnA framework offers an optimal solution for this purpose.
3+
Suppose you possess a set of videos, images, audio files, PDFs, or some combination thereof and wish to perform question-answering to extract insights from these documents. To respond to your questions, the system needs to comprehend a mix of textual, visual, and audio facts drawn from the document contents. The MultimodalQnA framework offers an optimal solution for this purpose.
44

5-
`MultimodalQnA` addresses your questions by dynamically fetching the most pertinent multimodal information (frames, transcripts, and/or captions) from your collection of videos, images, and audio files. For this purpose, MultimodalQnA utilizes [BridgeTower model](https://huggingface.co/BridgeTower/bridgetower-large-itm-mlm-gaudi), a multimodal encoding transformer model which merges visual and textual data into a unified semantic space. During the ingestion phase, the BridgeTower model embeds both visual cues and auditory facts as texts, and those embeddings are then stored in a vector database. When it comes to answering a question, the MultimodalQnA will fetch its most relevant multimodal content from the vector store and feed it into a downstream Large Vision-Language Model (LVM) as input context to generate a response for the user.
5+
`MultimodalQnA` addresses your questions by dynamically fetching the most pertinent multimodal information (e.g. images, transcripts, and captions) from your collection of video, image, audio, and PDF files. For this purpose, MultimodalQnA utilizes [BridgeTower model](https://huggingface.co/BridgeTower/bridgetower-large-itm-mlm-gaudi), a multimodal encoding transformer model which merges visual and textual data into a unified semantic space. During the ingestion phase, the BridgeTower model embeds both visual cues and auditory facts as texts, and those embeddings are then stored in a vector database. When it comes to answering a question, the MultimodalQnA will fetch its most relevant multimodal content from the vector store and feed it into a downstream Large Vision-Language Model (LVM) as input context to generate a response for the user.
66

77
The MultimodalQnA architecture shows below:
88

@@ -87,12 +87,12 @@ In the below, we provide a table that describes for each microservice component
8787
<details>
8888
<summary><b>Gaudi default compose.yaml</b></summary>
8989

90-
| MicroService | Open Source Project | HW | Port | Endpoint |
91-
| ------------ | --------------------- | ----- | ---- | ----------------------------------------------- |
92-
| Embedding | Langchain | Xeon | 6000 | /v1/embeddings |
93-
| Retriever | Langchain, Redis | Xeon | 7000 | /v1/multimodal_retrieval |
94-
| LVM | Langchain, TGI | Gaudi | 9399 | /v1/lvm |
95-
| Dataprep | Redis, Langchain, TGI | Gaudi | 6007 | /v1/generate_transcripts, /v1/generate_captions |
90+
| MicroService | Open Source Project | HW | Port | Endpoint |
91+
| ------------ | --------------------- | ----- | ---- | --------------------------------------------------------------------- |
92+
| Embedding | Langchain | Xeon | 6000 | /v1/embeddings |
93+
| Retriever | Langchain, Redis | Xeon | 7000 | /v1/multimodal_retrieval |
94+
| LVM | Langchain, TGI | Gaudi | 9399 | /v1/lvm |
95+
| Dataprep | Redis, Langchain, TGI | Gaudi | 6007 | /v1/generate_transcripts, /v1/generate_captions, /v1/ingest_with_text |
9696

9797
</details>
9898

@@ -172,8 +172,38 @@ docker compose -f compose.yaml up -d
172172

173173
## MultimodalQnA Demo on Gaudi2
174174

175-
![MultimodalQnA-upload-waiting-screenshot](./assets/img/upload-gen-trans.png)
175+
### Multimodal QnA UI
176176

177-
![MultimodalQnA-upload-done-screenshot](./assets/img/upload-gen-captions.png)
177+
![MultimodalQnA-ui-screenshot](./assets/img/mmqna-ui.png)
178178

179-
![MultimodalQnA-query-example-screenshot](./assets/img/example_query.png)
179+
### Video Ingestion
180+
181+
![MultimodalQnA-ingest-video-screenshot](./assets/img/video-ingestion.png)
182+
183+
### Text Query following the ingestion of a Video
184+
185+
![MultimodalQnA-video-query-screenshot](./assets/img/video-query.png)
186+
187+
### Image Ingestion
188+
189+
![MultimodalQnA-ingest-image-screenshot](./assets/img/image-ingestion.png)
190+
191+
### Text Query following the ingestion of an image
192+
193+
![MultimodalQnA-video-query-screenshot](./assets/img/image-query.png)
194+
195+
### Audio Ingestion
196+
197+
![MultimodalQnA-audio-ingestion-screenshot](./assets/img/audio-ingestion.png)
198+
199+
### Text Query following the ingestion of an Audio Podcast
200+
201+
![MultimodalQnA-audio-query-screenshot](./assets/img/audio-query.png)
202+
203+
### PDF Ingestion
204+
205+
![MultimodalQnA-upload-pdf-screenshot](./assets/img/ingest_pdf.png)
206+
207+
### Text query following the ingestion of a PDF
208+
209+
![MultimodalQnA-pdf-query-example-screenshot](./assets/img/pdf-query.png)
40.8 KB
Loading
63.1 KB
Loading
931 KB
Loading
219 KB
Loading
124 KB
Loading

MultimodalQnA/assets/img/mmqna-ui.png

24.8 KB
Loading
98.7 KB
Loading
595 KB
Loading
138 KB
Loading

MultimodalQnA/docker_compose/intel/cpu/xeon/README.md

Lines changed: 87 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,10 @@ lvm
4040
===
4141
Port 9399 - Open to 0.0.0.0/0
4242
43+
whisper
44+
===
45+
port 7066 - Open to 0.0.0.0/0
46+
4347
dataprep-multimodal-redis
4448
===
4549
Port 6007 - Open to 0.0.0.0/0
@@ -75,34 +79,47 @@ export your_no_proxy=${your_no_proxy},"External_Public_IP"
7579
export no_proxy=${your_no_proxy}
7680
export http_proxy=${your_http_proxy}
7781
export https_proxy=${your_http_proxy}
78-
export EMBEDDER_PORT=6006
79-
export MMEI_EMBEDDING_ENDPOINT="http://${host_ip}:$EMBEDDER_PORT"
80-
export MM_EMBEDDING_PORT_MICROSERVICE=6000
81-
export WHISPER_SERVER_PORT=7066
82-
export WHISPER_SERVER_ENDPOINT="http://${host_ip}:${WHISPER_SERVER_PORT}/v1/asr"
83-
export REDIS_URL="redis://${host_ip}:6379"
82+
export MM_EMBEDDING_SERVICE_HOST_IP=${host_ip}
83+
export MM_RETRIEVER_SERVICE_HOST_IP=${host_ip}
84+
export LVM_SERVICE_HOST_IP=${host_ip}
85+
export MEGA_SERVICE_HOST_IP=${host_ip}
86+
export WHISPER_PORT=7066
87+
export WHISPER_SERVER_ENDPOINT="http://${host_ip}:${WHISPER_PORT}/v1/asr"
88+
export WHISPER_MODEL="base"
89+
export MAX_IMAGES=1
90+
export REDIS_DB_PORT=6379
91+
export REDIS_INSIGHTS_PORT=8001
92+
export REDIS_URL="redis://${host_ip}:${REDIS_DB_PORT}"
8493
export REDIS_HOST=${host_ip}
8594
export INDEX_NAME="mm-rag-redis"
95+
export DATAPREP_MMR_PORT=5000
96+
export DATAPREP_INGEST_SERVICE_ENDPOINT="http://${host_ip}:${DATAPREP_MMR_PORT}/v1/dataprep/ingest"
97+
export DATAPREP_GEN_TRANSCRIPT_SERVICE_ENDPOINT="http://${host_ip}:${DATAPREP_MMR_PORT}/v1/dataprep/generate_transcripts"
98+
export DATAPREP_GEN_CAPTION_SERVICE_ENDPOINT="http://${host_ip}:${DATAPREP_MMR_PORT}/v1/dataprep/generate_captions"
99+
export DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:${DATAPREP_MMR_PORT}/v1/dataprep/get"
100+
export DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:${DATAPREP_MMR_PORT}/v1/dataprep/delete"
101+
export EMM_BRIDGETOWER_PORT=6006
102+
export EMBEDDING_MODEL_ID="BridgeTower/bridgetower-large-itm-mlm-itc"
86103
export BRIDGE_TOWER_EMBEDDING=true
104+
export MMEI_EMBEDDING_ENDPOINT="http://${host_ip}:$EMM_BRIDGETOWER_PORT"
105+
export MM_EMBEDDING_PORT_MICROSERVICE=6000
106+
export REDIS_RETRIEVER_PORT=7000
107+
export LVM_PORT=9399
87108
export LLAVA_SERVER_PORT=8399
88-
export LVM_ENDPOINT="http://${host_ip}:8399"
89-
export EMBEDDING_MODEL_ID="BridgeTower/bridgetower-large-itm-mlm-itc"
90109
export LVM_MODEL_ID="llava-hf/llava-1.5-7b-hf"
91-
export WHISPER_MODEL="base"
92-
export MM_EMBEDDING_SERVICE_HOST_IP=${host_ip}
93-
export MM_RETRIEVER_SERVICE_HOST_IP=${host_ip}
94-
export LVM_SERVICE_HOST_IP=${host_ip}
95-
export MEGA_SERVICE_HOST_IP=${host_ip}
96-
export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/multimodalqna"
97-
export DATAPREP_INGEST_SERVICE_ENDPOINT="http://${host_ip}:5000/v1/dataprep/ingest"
98-
export DATAPREP_GEN_TRANSCRIPT_SERVICE_ENDPOINT="http://${host_ip}:5000/v1/dataprep/generate_transcripts"
99-
export DATAPREP_GEN_CAPTION_SERVICE_ENDPOINT="http://${host_ip}:5000/v1/dataprep/generate_captions"
100-
export DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:5000/v1/dataprep/get"
101-
export DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:5000/v1/dataprep/delete"
110+
export LVM_ENDPOINT="http://${host_ip}:$LLAVA_SERVER_PORT"
111+
export MEGA_SERVICE_PORT=8888
112+
export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:$MEGA_SERVICE_PORT/v1/multimodalqna"
113+
export UI_PORT=5173
102114
```
103115

104116
Note: Please replace with `host_ip` with you external IP address, do not use localhost.
105117

118+
> Note: The `MAX_IMAGES` environment variable is used to specify the maximum number of images that will be sent from the LVM service to the LLaVA server.
119+
> If an image list longer than `MAX_IMAGES` is sent to the LVM server, a shortened image list will be sent to the LLaVA service. If the image list
120+
> needs to be shortened, the most recent images (the ones at the end of the list) are prioritized to send to the LLaVA service. Some LLaVA models have not
121+
> been trained with multiple images and may lead to inaccurate results. If `MAX_IMAGES` is not set, it will default to `1`.
122+
106123
## 🚀 Build Docker Images
107124

108125
### 1. Build embedding-multimodal-bridgetower Image
@@ -112,7 +129,7 @@ Build embedding-multimodal-bridgetower docker image
112129
```bash
113130
git clone https://github.com/opea-project/GenAIComps.git
114131
cd GenAIComps
115-
docker build --no-cache -t opea/embedding-multimodal-bridgetower:latest --build-arg EMBEDDER_PORT=$EMBEDDER_PORT --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/third_parties/bridgetower/src/Dockerfile .
132+
docker build --no-cache -t opea/embedding-multimodal-bridgetower:latest --build-arg EMBEDDER_PORT=$EMM_BRIDGETOWER_PORT --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/third_parties/bridgetower/src/Dockerfile .
116133
```
117134

118135
Build embedding microservice image
@@ -147,7 +164,7 @@ docker build --no-cache -t opea/lvm:latest --build-arg https_proxy=$https_proxy
147164
docker build --no-cache -t opea/dataprep:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/src/Dockerfile .
148165
```
149166

150-
### 5. Build asr images
167+
### 5. Build Whisper Server Image
151168

152169
Build whisper server image
153170

@@ -214,14 +231,14 @@ docker compose -f compose.yaml up -d
214231
1. embedding-multimodal-bridgetower
215232

216233
```bash
217-
curl http://${host_ip}:${EMBEDDER_PORT}/v1/encode \
234+
curl http://${host_ip}:${EMM_BRIDGETOWER_PORT}/v1/encode \
218235
-X POST \
219236
-H "Content-Type:application/json" \
220237
-d '{"text":"This is example"}'
221238
```
222239

223240
```bash
224-
curl http://${host_ip}:${EMBEDDER_PORT}/v1/encode \
241+
curl http://${host_ip}:${EMM_BRIDGETOWER_PORT}/v1/encode \
225242
-X POST \
226243
-H "Content-Type:application/json" \
227244
-d '{"text":"This is example", "img_b64_str": "iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAFUlEQVR42mP8/5+hnoEIwDiqkL4KAcT9GO0U4BxoAAAAAElFTkSuQmCC"}'
@@ -247,13 +264,13 @@ curl http://${host_ip}:$MM_EMBEDDING_PORT_MICROSERVICE/v1/embeddings \
247264

248265
```bash
249266
export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(512)]; print(embedding)")
250-
curl http://${host_ip}:7000/v1/multimodal_retrieval \
267+
curl http://${host_ip}:${REDIS_RETRIEVER_PORT}/v1/multimodal_retrieval \
251268
-X POST \
252269
-H "Content-Type: application/json" \
253270
-d "{\"text\":\"test\",\"embedding\":${your_embedding}}"
254271
```
255272

256-
4. asr
273+
4. whisper
257274

258275
```bash
259276
curl ${WHISPER_SERVER_ENDPOINT} \
@@ -274,14 +291,14 @@ curl http://${host_ip}:${LLAVA_SERVER_PORT}/generate \
274291
6. lvm
275292

276293
```bash
277-
curl http://${host_ip}:9399/v1/lvm \
294+
curl http://${host_ip}:${LVM_PORT}/v1/lvm \
278295
-X POST \
279296
-H 'Content-Type: application/json' \
280297
-d '{"retrieved_docs": [], "initial_query": "What is this?", "top_n": 1, "metadata": [{"b64_img_str": "iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAFUlEQVR42mP8/5+hnoEIwDiqkL4KAcT9GO0U4BxoAAAAAElFTkSuQmCC", "transcript_for_inference": "yellow image", "video_id": "8c7461df-b373-4a00-8696-9a2234359fe0", "time_of_frame_ms":"37000000", "source_video":"WeAreGoingOnBullrun_8c7461df-b373-4a00-8696-9a2234359fe0.mp4"}], "chat_template":"The caption of the image is: '\''{context}'\''. {question}"}'
281298
```
282299

283300
```bash
284-
curl http://${host_ip}:9399/v1/lvm \
301+
curl http://${host_ip}:${LVM_PORT}/v1/lvm \
285302
-X POST \
286303
-H 'Content-Type: application/json' \
287304
-d '{"image": "iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAFUlEQVR42mP8/5+hnoEIwDiqkL4KAcT9GO0U4BxoAAAAAElFTkSuQmCC", "prompt":"What is this?"}'
@@ -290,15 +307,15 @@ curl http://${host_ip}:9399/v1/lvm \
290307
Also, validate LVM Microservice with empty retrieval results
291308

292309
```bash
293-
curl http://${host_ip}:9399/v1/lvm \
310+
curl http://${host_ip}:${LVM_PORT}/v1/lvm \
294311
-X POST \
295312
-H 'Content-Type: application/json' \
296313
-d '{"retrieved_docs": [], "initial_query": "What is this?", "top_n": 1, "metadata": [], "chat_template":"The caption of the image is: '\''{context}'\''. {question}"}'
297314
```
298315

299316
7. dataprep-multimodal-redis
300317

301-
Download a sample video, image, and audio file and create a caption
318+
Download a sample video, image, pdf, and audio file and create a caption
302319

303320
```bash
304321
export video_fn="WeAreGoingOnBullrun.mp4"
@@ -307,13 +324,25 @@ wget http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/WeAreGoing
307324
export image_fn="apple.png"
308325
wget https://github.com/docarray/docarray/blob/main/tests/toydata/image-data/apple.png?raw=true -O ${image_fn}
309326

327+
export pdf_fn="nke-10k-2023.pdf"
328+
wget https://raw.githubusercontent.com/opea-project/GenAIComps/v1.1/comps/retrievers/redis/data/nke-10k-2023.pdf -O ${pdf_fn}
329+
310330
export caption_fn="apple.txt"
311331
echo "This is an apple." > ${caption_fn}
312332

313333
export audio_fn="AudioSample.wav"
314334
wget https://github.com/intel/intel-extension-for-transformers/raw/main/intel_extension_for_transformers/neural_chat/assets/audio/sample.wav -O ${audio_fn}
315335
```
316336

337+
```bash
338+
export DATAPREP_MMR_PORT=6007
339+
export DATAPREP_INGEST_SERVICE_ENDPOINT="http://${host_ip}:${DATAPREP_MMR_PORT}/v1/dataprep/ingest"
340+
export DATAPREP_GEN_TRANSCRIPT_SERVICE_ENDPOINT="http://${host_ip}:${DATAPREP_MMR_PORT}/v1/dataprep/generate_transcripts"
341+
export DATAPREP_GEN_CAPTION_SERVICE_ENDPOINT="http://${host_ip}:${DATAPREP_MMR_PORT}/v1/dataprep/generate_captions"
342+
export DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:${DATAPREP_MMR_PORT}/v1/dataprep/get"
343+
export DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:${DATAPREP_MMR_PORT}/v1/dataprep/delete"
344+
```
345+
317346
Test dataprep microservice with generating transcript. This command updates a knowledge base by uploading a local video .mp4 and an audio .wav file.
318347

319348
```bash
@@ -325,7 +354,7 @@ curl --silent --write-out "HTTPSTATUS:%{http_code}" \
325354
-F "files=@./${audio_fn}"
326355
```
327356

328-
Also, test dataprep microservice with generating an image caption using lvm microservice
357+
Also, test dataprep microservice with generating an image caption using lvm microservice.
329358

330359
```bash
331360
curl --silent --write-out "HTTPSTATUS:%{http_code}" \
@@ -334,13 +363,14 @@ curl --silent --write-out "HTTPSTATUS:%{http_code}" \
334363
-X POST -F "files=@./${image_fn}"
335364
```
336365

337-
Now, test the microservice with posting a custom caption along with an image
366+
Now, test the microservice with posting a custom caption along with an image and a PDF containing images and text.
338367

339368
```bash
340369
curl --silent --write-out "HTTPSTATUS:%{http_code}" \
341370
${DATAPREP_INGEST_SERVICE_ENDPOINT} \
342371
-H 'Content-Type: multipart/form-data' \
343-
-X POST -F "files=@./${image_fn}" -F "files=@./${caption_fn}"
372+
-X POST -F "files=@./${image_fn}" -F "files=@./${caption_fn}" \
373+
-F "files=@./${pdf_fn}"
344374
```
345375

346376
Also, you are able to get the list of all files that you uploaded:
@@ -358,7 +388,8 @@ Then you will get the response python-style LIST like this. Notice the name of e
358388
"WeAreGoingOnBullrun_7ac553a1-116c-40a2-9fc5-deccbb89b507.mp4",
359389
"WeAreGoingOnBullrun_6d13cf26-8ba2-4026-a3a9-ab2e5eb73a29.mp4",
360390
"apple_fcade6e6-11a5-44a2-833a-3e534cbe4419.png",
361-
"AudioSample_976a85a6-dc3e-43ab-966c-9d81beef780c.wav
391+
"nke-10k-2023_28000757-5533-4b1b-89fe-7c0a1b7e2cd0.pdf",
392+
"AudioSample_976a85a6-dc3e-43ab-966c-9d81beef780c.wav"
362393
]
363394
```
364395

@@ -372,21 +403,41 @@ curl -X POST \
372403

373404
8. MegaService
374405

406+
Test the MegaService with a text query:
407+
375408
```bash
376-
curl http://${host_ip}:8888/v1/multimodalqna \
409+
curl http://${host_ip}:${MEGA_SERVICE_PORT}/v1/multimodalqna \
377410
-H "Content-Type: application/json" \
378411
-X POST \
379412
-d '{"messages": "What is the revenue of Nike in 2023?"}'
380413
```
381414

415+
Test the MegaService with an audio query:
416+
417+
```bash
418+
curl http://${host_ip}:${MEGA_SERVICE_PORT}/v1/multimodalqna \
419+
-H "Content-Type: application/json" \
420+
-d '{"messages": [{"role": "user", "content": [{"type": "audio", "audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}]}]}'
421+
```
422+
423+
Test the MegaService with a text and image query:
424+
425+
```bash
426+
curl http://${host_ip}:${MEGA_SERVICE_PORT}/v1/multimodalqna \
427+
-H "Content-Type: application/json" \
428+
-d '{"messages": [{"role": "user", "content": [{"type": "text", "text": "Green bananas in a tree"}, {"type": "image_url", "image_url": {"url": "http://images.cocodataset.org/test-stuff2017/000000004248.jpg"}}]}]}'
429+
```
430+
431+
Test the MegaService with a back and forth conversation between the user and assistant:
432+
382433
```bash
383-
curl http://${host_ip}:8888/v1/multimodalqna \
434+
curl http://${host_ip}:${MEGA_SERVICE_PORT}/v1/multimodalqna \
384435
-H "Content-Type: application/json" \
385436
-d '{"messages": [{"role": "user", "content": [{"type": "audio", "audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}]}]}'
386437
```
387438

388439
```bash
389-
curl http://${host_ip}:8888/v1/multimodalqna \
440+
curl http://${host_ip}:${MEGA_SERVICE_PORT}/v1/multimodalqna \
390441
-H "Content-Type: application/json" \
391442
-d '{"messages": [{"role": "user", "content": [{"type": "text", "text": "hello, "}, {"type": "image_url", "image_url": {"url": "https://www.ilankelman.org/stopsigns/australia.jpg"}}]}, {"role": "assistant", "content": "opea project! "}, {"role": "user", "content": "chao, "}], "max_tokens": 10}'
392443
```

0 commit comments

Comments
 (0)