Skip to content

Add audio input to gradioUI #1201

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

GTimothee
Copy link

@GTimothee GTimothee commented Apr 15, 2025

Hi,

I am experimenting with smolagents and the first thing I wanted to do was to simply give a vocal command and get a result from it. Unfortunately the default gradio interface does not implement voice command. Hence this PR.

I simply added the possibility to input a vocal command, be it from microphone or audio file.

You can find the demo here https://huggingface.co/spaces/GTimothee/smolagent

Basically what I implemented works as follows: if you want to input voice command, you need a function to process it. So the idea is that you must pass a function to process the audio as you like, and adding it will enable gradio UI to just display an audio input. Then on submit it will run your function to extract the text out of the audio, and pass it to the agent. Of course the next step will be to enable audio output for the agent.

GradioUI(agent).launch(speech2text_func=speech2text_func)

You can find all the code in the app.py of my space.

@GTimothee GTimothee changed the title Add audio input Add audio input to gradioUI Apr 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant