Skip to content

Commit 384d434

Browse files
authored
Merge pull request #4 from Shuyib/evals
Evals
2 parents 476ba1c + ddcaa0f commit 384d434

File tree

6 files changed

+369
-2
lines changed

6 files changed

+369
-2
lines changed

Makefile

+10-1
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ docstring: activate
4343

4444
format: activate
4545
# format code
46-
black utils/*.py *.py
46+
black utils/*.py tests/*.py
4747

4848
clean:
4949
# clean directory of cache
@@ -61,6 +61,7 @@ clean:
6161
rm -rf utils/__pycache__
6262
rm -rf utils/*.log
6363
rm -rf *.log
64+
rm -rf tests/__pycache__
6465

6566
lint: activate install
6667
#flake8 or #pylint
@@ -69,6 +70,14 @@ lint: activate install
6970
# C - convention
7071
pylint --disable=R,C --errors-only *.py
7172

73+
test: activate install
74+
# run tests
75+
echo @running tests
76+
echo @we used this signature to run tests: $(PYTHON) -m pytest tests/testcases.py
77+
echo @for single tests, we used this signature: $(PYTHON) -m pytest tests/testcases.py::test_function_name
78+
$(PYTHON) -m pytest tests/test_cases.py -v
79+
$(PYTHON) -m pytest tests/test_run.py -v --asyncio-mode=strict
80+
7281
run: activate install format
7382
# run test_app
7483
# run each file separately, bc if one fails, all fail

README.md

+28-1
Original file line numberDiff line numberDiff line change
@@ -36,10 +36,15 @@ Learn more about tool calling <https://gorilla.cs.berkeley.edu/leaderboard.html>
3636
├── README.md - This file contains the project documentation. This is the file you are currently reading.
3737
├── requirements.txt - This file contains the dependencies for the project.
3838
├── summary.png - How function calling works with a diagram.
39+
├── tests - This directory contains the test files for the project.
40+
│ ├── __init__.py - This file initializes the tests directory as a package.
41+
│ ├── test_cases.py - This file contains the test cases for the project.
42+
│ └── test_run.py - This file contains the code to run the test cases for the function calling LLM.
3943
└── utils - This directory contains the utility files for the project.
4044
├── __init__.py - This file initializes the utils directory as a package.
4145
├── function_call.py - This file contains the code to call a function using LLMs.
42-
└── communication_apis.py - This file contains the code to do with communication apis & experiments.
46+
└── communication_apis.py - This file contains the code to do with communication apis & experiments.
47+
4348

4449
## Installation
4550
The project uses python 3.12. To install the project, follow the steps below:
@@ -113,6 +118,9 @@ Notes:
113118
echo "AT_API_KEY = yourapikey" >> .env
114119
echo "AT_USERNAME = yourusername" >> .env
115120
echo "LANGTRACE_API_KEY= yourlangtraceapikey" >> .env
121+
echo "TEST_PHONE_NUMBER = yourphonenumber" >> .env
122+
echo "TEST_PHONE_NUMBER_2 = yourphonenumber" >> .env
123+
echo "TEST_PHONE_NUMBER_3 = yourphonenumber" >> .env
116124
```
117125
- The Dockerfile creates 2 images for the ollama server and the gradio dashboard. The ollama server is running on port 11434 and the gradio dashboard is running on port 7860 . You can access the gradio dashboard by visiting <http://localhost:7860> in your browser & the ollama server by visiting <http://localhost:11434> in your browser. They consume about 2.72GB of storage in the container.
118126
- The docker-compose.yml file is used to run the ollama server and the gradio dashboard. The docker-compose-codecarbon.yml file is used to run the ollama server, the gradio dashboard and the codecarbon project.
@@ -141,6 +149,10 @@ ollama run qwen2.5:0.5b
141149
```bash
142150
export AT_API_KEY=yourapikey
143151
export AT_USERNAME=yourusername
152+
export LANGTRACE_API_KEY=yourlangtraceapikey
153+
export TEST_PHONE_NUMBER=yourphonenumber
154+
export TEST_PHONE_NUMBER_2=yourphonenumber
155+
export TEST_PHONE_NUMBER_3=yourphonenumber
144156
```
145157
- Continue running the installation steps in the terminal.
146158
- Send your first message and airtime with an LLM. 🌠
@@ -152,6 +164,14 @@ This project uses LLMs to send airtime to a phone number. The difference is that
152164
- Send airtime to xxxxxxxxxx046 and xxxxxxxxxx524 with an amount of 10 in currency KES.
153165
- Send a message to xxxxxxxxxx046 and xxxxxxxxxx524 with a message "Hello, how are you?", using the username "username".
154166

167+
### Responsible AI Practices
168+
This project implements several responsible AI practices:
169+
- All test data is anonymized to protect privacy.
170+
- Input validation to prevent misuse (negative amounts, spam detection).
171+
- Handling of sensitive content and edge cases.
172+
- Comprehensive test coverage for various scenarios.
173+
- Secure handling of credentials and personal information.
174+
155175
![Process Summary](summary.png)
156176

157177
## Use cases
@@ -164,5 +184,12 @@ This project uses LLMs to send airtime to a phone number. The difference is that
164184
## Contributing
165185
Contributions are welcome. If you would like to contribute to the project, you can fork the repository, create a new branch, make your changes and then create a pull request.
166186

187+
### Testing Guidelines
188+
When contributing, please ensure:
189+
- All test data uses anonymized placeholders
190+
- Edge cases and invalid inputs are properly tested
191+
- Sensitive content handling is verified
192+
- No real personal information is included in tests
193+
167194
## License
168195
[License information](https://github.com/Shuyib/tool_calling_api/blob/main/LICENSE).

requirements.txt

+2
Original file line numberDiff line numberDiff line change
@@ -10,3 +10,5 @@ gradio==5.7.1
1010
duckduckgo_search==6.3.2
1111
langtrace-python-sdk==3.3.14
1212
setuptools==75.6.0
13+
pytest==8.3.4
14+
pytest-asyncio==0.25.0

tests/__init__.py

Whitespace-only changes.

tests/test_cases.py

+131
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
"""
2+
Unit tests for the function calling utilities.
3+
4+
This module contains tests for sending airtime, sending messages, and searching news
5+
using the Africa's Talking API and DuckDuckGo News API. The tests mock external
6+
dependencies to ensure isolation and reliability.
7+
"""
8+
9+
import os
10+
import re
11+
from unittest.mock import patch
12+
from utils.function_call import send_airtime, send_message, search_news
13+
14+
# Load environment variables: TEST_PHONE_NUMBER
15+
PHONE_NUMBER = os.getenv("TEST_PHONE_NUMBER")
16+
17+
18+
@patch("utils.function_call.africastalking.Airtime")
19+
def test_send_airtime_success(mock_airtime):
20+
"""
21+
Test the send_airtime function to ensure it successfully sends airtime.
22+
23+
This test mocks the Africa's Talking Airtime API and verifies that the
24+
send_airtime function returns a response containing the word 'Sent'.
25+
26+
Parameters
27+
----------
28+
mock_airtime : MagicMock
29+
Mocked Airtime API from Africa's Talking.
30+
"""
31+
# Configure the mock Airtime response
32+
mock_airtime.return_value.send.return_value = {
33+
"numSent": 1,
34+
"responses": [{"status": "Sent"}],
35+
}
36+
37+
# Call the send_airtime function
38+
result = send_airtime(PHONE_NUMBER, "KES", 5)
39+
40+
# Define patterns to check in the response
41+
message_patterns = [
42+
r"Sent",
43+
]
44+
45+
# Assert each pattern is found in the response
46+
for pattern in message_patterns:
47+
assert re.search(
48+
pattern, str(result)
49+
), f"Pattern '{pattern}' not found in response"
50+
51+
52+
@patch("utils.function_call.africastalking.SMS")
53+
def test_send_message_success(mock_sms):
54+
"""
55+
Test the send_message function to ensure it successfully sends a message.
56+
57+
This test mocks the Africa's Talking SMS API and verifies that the
58+
send_message function returns a response containing 'Sent to 1/1'.
59+
60+
Parameters
61+
----------
62+
mock_sms : MagicMock
63+
Mocked SMS API from Africa's Talking.
64+
"""
65+
# Configure the mock SMS response
66+
mock_sms.return_value.send.return_value = {
67+
"SMSMessageData": {"Message": "Sent to 1/1"}
68+
}
69+
70+
# Call the send_message function
71+
result = send_message(PHONE_NUMBER, "In Qwen, we trust", os.getenv("AT_USERNAME"))
72+
73+
# Define patterns to check in the response
74+
message_patterns = [r"Sent to 1/1"]
75+
76+
# Assert each pattern is found in the response
77+
for pattern in message_patterns:
78+
assert re.search(
79+
pattern, str(result)
80+
), f"Pattern '{pattern}' not found in response"
81+
82+
83+
@patch("utils.function_call.DDGS")
84+
def test_search_news_success(mock_ddgs):
85+
"""
86+
Test the search_news function to ensure it retrieves news articles correctly.
87+
88+
This test mocks the DuckDuckGo News API and verifies that the
89+
search_news function returns results matching the expected patterns.
90+
91+
Parameters
92+
----------
93+
mock_ddgs : MagicMock
94+
Mocked DuckDuckGo DDGS API.
95+
"""
96+
# Configure the mock DDGS response with a realistic news article
97+
mock_ddgs.return_value.news.return_value = [
98+
{
99+
"date": "2024-12-20T02:07:00+00:00",
100+
"title": "Hedge fund leader loves this AI stock",
101+
"body": "Sample article body text",
102+
"url": "https://example.com/article",
103+
"image": "https://example.com/image.jpg",
104+
"source": "MSN",
105+
}
106+
]
107+
108+
# Call the search_news function
109+
result = search_news("AI")
110+
111+
# Define regex patterns to validate response format
112+
patterns = [
113+
r"\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\+\d{2}:\d{2}", # Date format
114+
r'"title":\s*"[^"]+?"', # Title field
115+
r'"source":\s*"[^"]+?"', # Source field
116+
r'https?://[^\s<>"]+?', # URL format
117+
]
118+
119+
# Convert result to string for regex matching
120+
result_str = str(result)
121+
122+
# Assert all patterns match in the result
123+
for pattern in patterns:
124+
assert re.search(
125+
pattern, result_str
126+
), f"Pattern '{pattern}' not found in response"
127+
128+
# Verify that the news method was called with expected arguments
129+
mock_ddgs.return_value.news.assert_called_once_with(
130+
keywords="AI", region="wt-wt", safesearch="off", timelimit="d", max_results=5
131+
)

0 commit comments

Comments
 (0)