Skip to content

MultimodalQnA audio features completion #1698

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 47 commits into from
Apr 3, 2025

Conversation

mhbuehler
Copy link
Collaborator

@mhbuehler mhbuehler commented Mar 19, 2025

Description

This PR completes the third and final phase of the RFC for MultimodalQnA image and audio support. The changes in GenAIExamples are listed below. The accompanying PR in GenAIComps is opea-project/GenAIComps#1433.

New Features:

  • Added playable audio/TTS query responses from megaservice API
  • Added ability to upload, record, and send audio captions to dataprep API
  • Enabled retrieved audio files to be displayed in multimodal query results box
  • Combined text, image, and audio query types into a unified multimodal text box
  • Added ability to list and delete files in the vector store
  • Parameterized UI timeout

Bug Fixes:

  • Fixed PDF ingestion status
  • Fixed PDF clearing behavior

Issues

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds new functionality)
  • Breaking change (fix or feature that would break existing design and interface)
  • Others (enhancement, documentation, validation, etc.)

Dependencies

Version upgrades:

  • gradio: 5.11.0 -> 5.22.0 or higher
  • gradio_pdf: 0.0.19 -> 0.0.20

Tests

Updated:

  • MultimodalQnA/tests/test_compose_on_gaudi.sh
  • MultimodalQnA/tests/test_compose_on_xeon.sh
  • MultimodalQnA/tests/test_compose_on_rocm.sh

Authors

Co-authored-by: Harsha Ramayanam harsha.ramayanam@intel.com
Co-authored-by: Melanie Buehler melanie.h.buehler@intel.com
Co-authored-by: Dina Suehiro Jones dina.s.jones@intel.com
Co-authored-by: Omar Khleif omar.khleif@intel.com

okhleif-IL and others added 22 commits February 5, 2025 10:24
* Added tests + updated docs for asr mp3 change

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* addressed review comments

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

---------
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
* Added logic for showing/deleting files from vector store

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Added message to show when vector store is empty

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

* Update MultimodalQnA/ui/gradio/multimodalqna_ui_gradio.py

Co-authored-by: Dina Suehiro Jones <dina.s.jones@intel.com>

---------

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Dina Suehiro Jones <dina.s.jones@intel.com>
* Parameterize UI timeout and increase default

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>

* Add new variable to compose.yaml

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>

* Update READMEs

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>

---------

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>
Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>
Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>
…ts (#58)

* MultimodalQnA README and diagram updates for phase 3 enhancements

Signed-off-by: dmsuehir <dina.s.jones@intel.com>

* Wording

Signed-off-by: dmsuehir <dina.s.jones@intel.com>

* Update to remove your_* vars

Signed-off-by: dmsuehir <dina.s.jones@intel.com>

* Updates based on review comments

Signed-off-by: dmsuehir <dina.s.jones@intel.com>

---------

Signed-off-by: dmsuehir <dina.s.jones@intel.com>
* added TTS linkage to backend

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* added modalities as a toggle

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* doc updates and code refactor

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* added tts test to megaservice tests

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* addressed recent review comments

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

---------

Signed-off-by: okhleif-IL <omar.khleif@intel.com>
* Add test for image and audio data ingestion

Signed-off-by: dmsuehir <dina.s.jones@intel.com>

* README updates

Signed-off-by: dmsuehir <dina.s.jones@intel.com>

* Add Gaudi tests

Signed-off-by: dmsuehir <dina.s.jones@intel.com>

* Add note about matching base names in test

Signed-off-by: dmsuehir <dina.s.jones@intel.com>

---------

Signed-off-by: dmsuehir <dina.s.jones@intel.com>
* fixed test and added tts validation

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* added gaudi test, reverted -speech change

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

---------

Signed-off-by: okhleif-IL <omar.khleif@intel.com>
* Enable audio caption upload in the UI

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>

* Improve handling of unsupported audio formats

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>

* Improve label and exception

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>

* Replace exception with error message so audio component still works

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>

---------

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>
Signed-off-by: dmsuehir <dina.s.jones@intel.com>
…se (#64)

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>
* Add missing env vars for MMQnA UI data prep endpoints

Signed-off-by: dmsuehir <dina.s.jones@intel.com>

* Remove dockerfile branch

Signed-off-by: dmsuehir <dina.s.jones@intel.com>

---------

Signed-off-by: dmsuehir <dina.s.jones@intel.com>
Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>
* first commit for tts addition

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* added TTS linkage to backend

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* removed unused import

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* added necessary env vars

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* reworked temp tts toggle logic

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* added modalities as a toggle

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* removed print statement

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* removed gaudi from tts

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* doc updates and code refactor

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* added tts test to megaservice tests

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* remove log diles

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* addressed recent review comments

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* Added Logic for audio responses & refactored code to align with new gradio version

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

* Minr bug fixes and UI changes

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

* UI layout update & handling empty text with spaces

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

* Updates on review comments

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

* Update on review comments

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

* Update MultimodalQnA/ui/gradio/multimodalqna_ui_gradio.py

Co-authored-by: Dina Suehiro Jones <dina.s.jones@intel.com>

* Some updates to review comments. More to come after testing

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

* Restrict file media types to known/working formats

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

* Remove extra whitespace

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

* Fix test_compose_on_gaudi.sh script's diff not syncing with phase3

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

* Changes per review comments

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

* Added single space to the pload

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

* Added logic to flush chatbot assistant's voice reponse .wav

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

* Fixed issue where assistant's image is not sent

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

* Revert build yaml

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

* Clear diff

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

* changes per review

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

* small change

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

* Update Dockerfile

Revert Dockerfile

---------

Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Co-authored-by: okhleif-IL <omar.khleif@intel.com>
Co-authored-by: Dina Suehiro Jones <dina.s.jones@intel.com>
Copy link

github-actions bot commented Mar 19, 2025

Dependency Review

✅ No vulnerabilities or license issues found.

Scanned Files

  • MultimodalQnA/ui/gradio/requirements.txt

mhbuehler and others added 2 commits March 19, 2025 15:21
* Point git clones to corresponding fork of comps

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>

* Use --depth 1

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>

---------

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>
@ashahba ashahba added the v1.3 label Mar 19, 2025
HarshaRamayanam and others added 2 commits March 21, 2025 13:58
* Initial commit to add logic

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

* changes per review comments from closed PR #66

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

* Modified screenshot for audio query

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

* rename tmp_dir2 to audio_tmp_dir

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

---------

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
* Specify MMQnA UI to use pytantic==2.10.6

Signed-off-by: Dina Suehiro Jones <dina.s.jones@intel.com>

* Formatting

Signed-off-by: Dina Suehiro Jones <dina.s.jones@intel.com>

---------

Signed-off-by: Dina Suehiro Jones <dina.s.jones@intel.com>
dmsuehir and others added 6 commits March 31, 2025 08:23
* Revert git clone changes after tests pass

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>

* Updated Dockerfile with opea/comps-base

Signed-off-by: Dina Suehiro Jones <dina.s.jones@intel.com>

---------

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>
Signed-off-by: Dina Suehiro Jones <dina.s.jones@intel.com>
Co-authored-by: Dina Suehiro Jones <dina.s.jones@intel.com>
HarshaRamayanam and others added 2 commits April 2, 2025 14:59
Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Signed-off-by: Dina Suehiro Jones <dina.s.jones@intel.com>
Copy link
Collaborator

@ashahba ashahba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
For the one file we are downloading from an archived repo, we can always come back to it and decide.

@ashahba ashahba merged commit bbd5344 into opea-project:main Apr 3, 2025
19 checks passed
@ashahba ashahba deleted the mmqna-phase3 branch April 3, 2025 04:45
chyundunovDatamonsters pushed a commit to chyundunovDatamonsters/OPEA-GenAIExamples that referenced this pull request May 16, 2025
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>
Signed-off-by: dmsuehir <dina.s.jones@intel.com>
Signed-off-by: Dina Suehiro Jones <dina.s.jones@intel.com>
Co-authored-by: Omar Khleif <omar.khleif@intel.com>
Co-authored-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Dina Suehiro Jones <dina.s.jones@intel.com>
Co-authored-by: Liang Lv <liang1.lv@intel.com>
Co-authored-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>
Signed-off-by: Chingis Yundunov <c.yundunov@datamonsters.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants