Skip to content

Commit 9217885

Browse files
authored
Merge pull request #5 from Shuyib/Speechtotext
Speechtotext with Groq
2 parents 20c8d79 + 1b9dfa5 commit 9217885

File tree

5 files changed

+821
-139
lines changed

5 files changed

+821
-139
lines changed

Makefile

+15-11
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ venv/bin/activate: requirements.txt #requirements.txt is a requirement, otherwis
2121
# make command executable
2222
# chmod is a bash command, +x is giving the ADMIN user permission to execute it
2323
# if it's a+x, that means anyone can run it, even if you aren't an ADMIN
24-
chmod +x .venv/bin/activate
24+
chmod +x .venv/bin/activate
2525
# activate virtual environment
2626
. .venv/bin/activate
2727

@@ -37,19 +37,19 @@ install: venv/bin/activate requirements.txt # prerequisite
3737

3838
docstring: activate
3939
# format docstring, might have to change this as well
40-
# write a template using a numpydoc convention and output it to my python file
40+
# write a template using a numpydoc convention and output it to my python file
4141
# so basically just document functions, classes etc. in the numpy style
4242
pyment -w -o numpydoc *.py
4343

44-
format: activate
44+
format: activate
4545
# format code
46-
black utils/*.py tests/*.py
46+
black *.py utils/*.py tests/*.py
4747

4848
clean:
4949
# clean directory of cache
5050
# files like pychache are gen'd after running py files
51-
# the data speeds up execution of py files in subsequent runs
52-
# reduces size of repo
51+
# the data speeds up execution of py files in subsequent runs
52+
# reduces size of repo
5353
# during version control, removing them would avoid conflicts with other dev's cached files
5454
# add code to remove ipynb checkpoints
5555
# the &&\ is used to say, after running this successfully, run the next...
@@ -63,12 +63,12 @@ clean:
6363
rm -rf *.log
6464
rm -rf tests/__pycache__
6565

66-
lint: activate install
66+
lint: activate install
6767
#flake8 or #pylint
6868
# In this scenario it'll only tell as errors found in your code
69-
# R - refactor
69+
# R - refactor
7070
# C - convention
71-
pylint --disable=R,C --errors-only *.py
71+
pylint --disable=R,C --errors-only *.py
7272

7373
test: activate install
7474
# run tests
@@ -87,6 +87,10 @@ run_gradio: activate install format
8787
# run gradio
8888
$(PYTHON) app.py
8989

90+
run_gradio_stt: activate install format
91+
# run gradio
92+
$(PYTHON) voice_stt_mode.py
93+
9094
docker_build: Dockerfile
9195
#build container
9296
# docker build -t $(DOCKER_IMAGE_TAG) .
@@ -95,7 +99,7 @@ docker_run_test: Dockerfile.app Dockerfile.ollama
9599
# linting Dockerfile
96100
docker run --rm -i hadolint/hadolint < Dockerfile.ollama
97101
docker run --rm -i hadolint/hadolint < Dockerfile.app
98-
102+
99103

100104
docker_clean: Dockerfile.ollama Dockerfile.app
101105
# clean docker
@@ -109,7 +113,7 @@ docker_run: Dockerfile.ollama Dockerfile.app
109113
# run docker
110114
# this is basically a test to see if a docker image is being created successfully
111115
docker-compose up --build
112-
116+
113117
setup_readme: ## Create a README.md
114118
@if [ ! -f README.md ]; then \
115119
echo "# Project Name\n\

README.md

+68-61
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,11 @@ Function-calling with Python and ollama. We are going to use the Africa's Talkin
66

77
NB: The phone numbers are placeholders for the actual phone numbers.
88
You need some VRAM to run this project. You can get VRAM from [here](https://vast.ai/)
9-
We recommend 400MB-8GB of VRAM for this project. It can run on CPU however, I recommend smaller models for this.
9+
We recommend 400MB-8GB of VRAM for this project. It can run on CPU however, I recommend smaller models for this.
1010

11-
[Mistral 7B](https://ollama.com/library/mistral), **llama 3.2 3B/1B**, [**Qwen 2.5: 0.5/1.5B**](https://ollama.com/library/qwen2.5:1.5b), [nemotron-mini 4b](https://ollama.com/library/nemotron-mini) and [llama3.1 8B](https://ollama.com/library/llama3.1) are the recommended models for this project.
11+
[Mistral 7B](https://ollama.com/library/mistral), **llama 3.2 3B/1B**, [**Qwen 2.5: 0.5/1.5B**](https://ollama.com/library/qwen2.5:1.5b), [nemotron-mini 4b](https://ollama.com/library/nemotron-mini) and [llama3.1 8B](https://ollama.com/library/llama3.1) are the recommended models for this project.
1212

13-
Ensure ollama is installed on your laptop/server and running before running this project. You can install ollama from [here](ollama.com)
13+
Ensure ollama is installed on your laptop/server and running before running this project. You can install ollama from [here](ollama.com)
1414
Learn more about tool calling <https://gorilla.cs.berkeley.edu/leaderboard.html>
1515

1616

@@ -22,41 +22,41 @@ Learn more about tool calling <https://gorilla.cs.berkeley.edu/leaderboard.html>
2222
- [Usage](#usage)
2323
- [Use cases](#use-cases)
2424
- [Responsible AI Practices](#responsible-ai-practices)
25-
- [Limitations](#limitations)
25+
- [Limitations](#limitations)
2626
- [Contributing](#contributing)
27-
- [License](#license)
27+
- [License](#license)
2828

2929

3030
## File structure
31-
.
32-
├── Dockerfile.app - template to run the gradio dashboard.
33-
├── Dockerfile.ollama - template to run the ollama server.
34-
├── docker-compose.yml - use the ollama project and gradio dashboard.
35-
├── docker-compose-codecarbon.yml - use the codecarbon project, ollama and gradio dashboard.
36-
├── .env - This file contains the environment variables for the project. (Not included in the repository)
37-
├── app.py - the function_call.py using gradio as the User Interface.
38-
├── Makefile - This file contains the commands to run the project.
39-
├── README.md - This file contains the project documentation. This is the file you are currently reading.
40-
├── requirements.txt - This file contains the dependencies for the project.
41-
├── summary.png - How function calling works with a diagram.
42-
├── tests - This directory contains the test files for the project.
43-
│ ├── __init__.py - This file initializes the tests directory as a package.
44-
│ ├── test_cases.py - This file contains the test cases for the project.
45-
│ └── test_run.py - This file contains the code to run the test cases for the function calling LLM.
46-
└── utils - This directory contains the utility files for the project.
47-
├── __init__.py - This file initializes the utils directory as a package.
48-
├── function_call.py - This file contains the code to call a function using LLMs.
49-
└── communication_apis.py - This file contains the code to do with communication apis & experiments.
31+
.
32+
├── Dockerfile.app - template to run the gradio dashboard.
33+
├── Dockerfile.ollama - template to run the ollama server.
34+
├── docker-compose.yml - use the ollama project and gradio dashboard.
35+
├── docker-compose-codecarbon.yml - use the codecarbon project, ollama and gradio dashboard.
36+
├── .env - This file contains the environment variables for the project. (Not included in the repository)
37+
├── app.py - the function_call.py using gradio as the User Interface.
38+
├── Makefile - This file contains the commands to run the project.
39+
├── README.md - This file contains the project documentation. This is the file you are currently reading.
40+
├── requirements.txt - This file contains the dependencies for the project.
41+
├── summary.png - How function calling works with a diagram.
42+
├── tests - This directory contains the test files for the project.
43+
│ ├── __init__.py - This file initializes the tests directory as a package.
44+
│ ├── test_cases.py - This file contains the test cases for the project.
45+
│ └── test_run.py - This file contains the code to run the test cases for the function calling LLM.
46+
└── utils - This directory contains the utility files for the project.
47+
├── __init__.py - This file initializes the utils directory as a package.
48+
├── function_call.py - This file contains the code to call a function using LLMs.
49+
└── communication_apis.py - This file contains the code to do with communication apis & experiments.
5050

5151
### attribution
52-
This project uses the Qwen2.5-0.5B model developed by Alibaba Cloud under the Apache License 2.0. The original project can be found at [Qwen technical report](https://arxiv.org/abs/2412.15115)
52+
This project uses the Qwen2.5-0.5B model developed by Alibaba Cloud under the Apache License 2.0. The original project can be found at [Qwen technical report](https://arxiv.org/abs/2412.15115)
5353

5454
### License
5555

5656
This project is licensed under the Apache License 2.0. See the [LICENSE](./LICENSE) file for more details.
57-
57+
5858
## Installation
59-
The project uses python 3.12. To install the project, follow the steps below:
59+
The project uses python 3.12. To install the project, follow the steps below:
6060

6161
- Clone the repository
6262
```bash
@@ -65,7 +65,7 @@ git clone https://github.com/Shuyib/tool_calling_api.git
6565
- Change directory to the project directory
6666
```bash
6767
cd tool_calling_api
68-
```
68+
```
6969
Create a virtual environment
7070
```bash
7171
python3 -m venv .venv
@@ -88,7 +88,7 @@ make install
8888
```bash
8989
make run
9090
```
91-
Long way to run the project
91+
Long way to run the project
9292

9393
- Change directory to the utils directory
9494
```bash
@@ -121,82 +121,89 @@ make docker_run
121121
```
122122

123123
Notes:
124-
- The .env file contains the environment variables for the project. You can create a .env file and add the following environment variables:
124+
- The .env file contains the environment variables for the project. You can create a .env file and add the following environment variables:
125125

126126
```bash
127127
echo "AT_API_KEY = yourapikey" >> .env
128128
echo "AT_USERNAME = yourusername" >> .env
129-
echo "LANGTRACE_API_KEY= yourlangtraceapikey" >> .env
129+
echo "GROQ_API_KEY = yourgroqapikey" >> .env
130+
echo "LANGTRACE_API_KEY= yourlangtraceapikey" >> .env
130131
echo "TEST_PHONE_NUMBER = yourphonenumber" >> .env
131132
echo "TEST_PHONE_NUMBER_2 = yourphonenumber" >> .env
132133
echo "TEST_PHONE_NUMBER_3 = yourphonenumber" >> .env
133134
```
134-
- The Dockerfile creates 2 images for the ollama server and the gradio dashboard. The ollama server is running on port 11434 and the gradio dashboard is running on port 7860 . You can access the gradio dashboard by visiting <http://localhost:7860> in your browser & the ollama server by visiting <http://localhost:11434> in your browser. They consume about 2.72GB of storage in the container.
135+
- The Dockerfile creates 2 images for the ollama server and the gradio dashboard. The ollama server is running on port 11434 and the gradio dashboard is running on port 7860 . You can access the gradio dashboard by visiting <http://localhost:7860> in your browser & the ollama server by visiting <http://localhost:11434> in your browser. They consume about 2.72GB of storage in the container.
135136
- The docker-compose.yml file is used to run the ollama server and the gradio dashboard. The docker-compose-codecarbon.yml file is used to run the ollama server, the gradio dashboard and the codecarbon project.
136-
- You can learn more about how to make this system even more secure. Do this [course](https://www.kaggle.com/learn-guide/5-day-genai#GenAI).
137+
- You can learn more about how to make this system even more secure. Do this [course](https://www.kaggle.com/learn-guide/5-day-genai#GenAI).
137138

138139

139140
## Run in runpod.io
140-
Make an account if you haven't already. Once that's settled.
141+
Make an account if you haven't already. Once that's settled.
141142

142-
- Click on Deploy under Pods.
143-
- Select the cheapest option pod to deploy for example RTX 2000 Ada.
144-
- This will create a jupyter lab instance.
145-
- Follow the Installation steps in the terminal available. Until the make install.
146-
- Run this command. Install ollama and serve it then redirect output to a log file.
143+
- Click on Deploy under Pods.
144+
- Select the cheapest option pod to deploy for example RTX 2000 Ada.
145+
- This will create a jupyter lab instance.
146+
- Follow the Installation steps in the terminal available. Until the make install.
147+
- Run this command. Install ollama and serve it then redirect output to a log file.
147148

148149
```bash
149150
curl -fsSL https://ollama.com/install.sh | sh && ollama serve > ollama.log 2>&1 &
150151
```
151-
- Install your preferred model in the same terminal.
152+
- Install your preferred model in the same terminal.
152153

153154
```bash
154155
ollama run qwen2.5:0.5b
155156
```
156-
- Export your credentials but, if you are using a .env file, you can skip this step. It will be useful for Docker.
157+
- Export your credentials but, if you are using a .env file, you can skip this step. It will be useful for Docker.
157158

158159
```bash
159160
export AT_API_KEY=yourapikey
160161
export AT_USERNAME=yourusername
162+
export GROQ_API_KEY=yourgroqapikey
161163
export LANGTRACE_API_KEY=yourlangtraceapikey
162164
export TEST_PHONE_NUMBER=yourphonenumber
163165
export TEST_PHONE_NUMBER_2=yourphonenumber
164166
export TEST_PHONE_NUMBER_3=yourphonenumber
165167
```
166-
- Continue running the installation steps in the terminal.
167-
- Send your first message and airtime with an LLM. 🌠
168+
- Continue running the installation steps in the terminal.
169+
- Send your first message and airtime with an LLM. 🌠
168170

169-
Read more about setting up ollama and serveless options <https://blog.runpod.io/run-llama-3-1-405b-with-ollama-a-step-by-step-guide/> & <https://blog.runpod.io/run-llama-3-1-with-vllm-on-runpod-serverless/>
171+
Read more about setting up ollama and serveless options <https://blog.runpod.io/run-llama-3-1-405b-with-ollama-a-step-by-step-guide/> & <https://blog.runpod.io/run-llama-3-1-with-vllm-on-runpod-serverless/>
170172

171173
## Usage
172-
This project uses LLMs to send airtime to a phone number. The difference is that we are going to use the Africa's Talking API to send airtime to a phone number using Natural language. Here are examples of prompts you can use to send airtime to a phone number:
173-
- Send airtime to xxxxxxxxxx046 and xxxxxxxxxx524 with an amount of 10 in currency KES.
174+
This project uses LLMs to send airtime to a phone number. The difference is that we are going to use the Africa's Talking API to send airtime to a phone number using Natural language. Here are examples of prompts you can use to send airtime to a phone number:
175+
- Send airtime to xxxxxxxxxx046 and xxxxxxxxxx524 with an amount of 10 in currency KES.
174176
- Send a message to xxxxxxxxxx046 and xxxxxxxxxx524 with a message "Hello, how are you?", using the username "username".
175177

178+
## Updated Usage Instructions
179+
- The app now supports both Text and Voice input tabs.
180+
- In the Voice Input tab, record audio and click "Transcribe" to preview the transcription. Then click "Process Edited Text" to execute voice commands.
181+
- In the Text Input tab, directly type commands to send airtime or messages or to search news.
182+
176183
### Responsible AI Practices
177-
This project implements several responsible AI practices:
178-
- All test data is anonymized to protect privacy.
179-
- Input validation to prevent misuse (negative amounts, spam detection).
180-
- Handling of sensitive content and edge cases.
181-
- Comprehensive test coverage for various scenarios.
182-
- Secure handling of credentials and personal information.
184+
This project implements several responsible AI practices:
185+
- All test data is anonymized to protect privacy.
186+
- Input validation to prevent misuse (negative amounts, spam detection).
187+
- Handling of sensitive content and edge cases.
188+
- Comprehensive test coverage for various scenarios.
189+
- Secure handling of credentials and personal information.
183190

184191
![Process Summary](summary.png)
185192

186193
## Use cases
187-
* Non-Technical User Interfaces: Simplifies the process for non-coders to interact with APIs, making it easier for them to send airtime and messages without needing to understand the underlying code.
188-
* Customer Support Automation: Enables customer support teams to quickly send airtime or messages to clients using natural language commands, improving efficiency and response times.
189-
* Marketing Campaigns: Facilitates the automation of promotional messages and airtime rewards to customers, enhancing engagement and retention.
190-
* Emergency Notifications: Allows rapid dissemination of urgent alerts and notifications to a large number of recipients using simple prompts.
191-
* Educational Tools: Provides a practical example for teaching how to integrate APIs with natural language processing, which can be beneficial for coding bootcamps and workshops.
192-
* Multilingual Support: Supports multiple languages when sending messages and airtime, making it accessible to a diverse range of users. Testing for Arabic, French, English and Portuguese.
194+
* Non-Technical User Interfaces: Simplifies the process for non-coders to interact with APIs, making it easier for them to send airtime and messages without needing to understand the underlying code.
195+
* Customer Support Automation: Enables customer support teams to quickly send airtime or messages to clients using natural language commands, improving efficiency and response times.
196+
* Marketing Campaigns: Facilitates the automation of promotional messages and airtime rewards to customers, enhancing engagement and retention.
197+
* Emergency Notifications: Allows rapid dissemination of urgent alerts and notifications to a large number of recipients using simple prompts.
198+
* Educational Tools: Provides a practical example for teaching how to integrate APIs with natural language processing, which can be beneficial for coding bootcamps and workshops.
199+
* Multilingual Support: Supports multiple languages when sending messages and airtime, making it accessible to a diverse range of users. Testing for Arabic, French, English and Portuguese.
193200

194201
## Limitations
195-
- The project is limited to sending airtime, searching for news, and messages using the Africa's Talking API. The functionality can be expanded to include other APIs and services.
202+
- The project is limited to sending airtime, searching for news, and messages using the Africa's Talking API. The functionality can be expanded to include other APIs and services.
196203

197-
- The jailbreaking of the LLMS is a limitation. The LLMS are not perfect and can be manipulated to produce harmful outputs. This can be mitigated by using a secure environment and monitoring the outputs for any malicious content. However, the Best of N technique and prefix injection were effective in changing model behavior.
204+
- The jailbreaking of the LLMS is a limitation. The LLMS are not perfect and can be manipulated to produce harmful outputs. This can be mitigated by using a secure environment and monitoring the outputs for any malicious content. However, the Best of N technique and prefix injection were effective in changing model behavior.
198205

199-
- A small number of test cases were used to test the project. More test cases can be added to cover a wider range of scenarios and edge cases.
206+
- A small number of test cases were used to test the project. More test cases can be added to cover a wider range of scenarios and edge cases.
200207

201208
## Contributing
202209
Contributions are welcome. If you would like to contribute to the project, you can fork the repository, create a new branch, make your changes and then create a pull request.

0 commit comments

Comments
 (0)