Add version that has a LLM agent orchestration Autogen.

Shuyib · Shuyib · commit 3ca8ce7040ee · 2024-12-31T09:30:27.000+03:00
diff --git a/README.md b/README.md
@@ -28,30 +28,34 @@ Learn more about tool calling <https://gorilla.cs.berkeley.edu/leaderboard.html>
 
 
 ## File structure
-.    
-├── Dockerfile.app - template to run the gradio dashboard.    
-├── Dockerfile.ollama - template to run the ollama server.   
-├── docker-compose.yml - use the ollama project and gradio dashboard.    
-├── docker-compose-codecarbon.yml - use the codecarbon project, ollama and gradio dashboard.    
-├── .env - This file contains the environment variables for the project. (Not included in the repository)    
-├── app.py - the function_call.py using gradio as the User Interface.    
-├── Makefile - This file contains the commands to run the project.   
-├── README.md - This file contains the project documentation. This is the file you are currently reading.    
-├── requirements.txt - This file contains the dependencies for the project.    
-├── summary.png - How function calling works with a diagram.    
-├── tests - This directory contains the test files for the project.    
-│   ├── __init__.py - This file initializes the tests directory as a package.    
-│   ├── test_cases.py - This file contains the test cases for the project.    
-│   └── test_run.py - This file contains the code to run the test cases for the function calling LLM.    
-└── utils - This directory contains the utility files for the project.    
-│    ├── __init__.py - This file initializes the utils directory as a package.    
-│    ├── function_call.py - This file contains the code to call a function using LLMs.    
-│    └── communication_apis.py - This file contains the code to do with communication apis & experiments.      
-└── voice_stt_mode.py - Gradio tabbed interface with Speech-to-text interface that allows edits and a text interface.        
+.
+├── Dockerfile.app - template to run the gradio dashboard.
+├── Dockerfile.ollama - template to run the ollama server.
+├── docker-compose.yml - use the ollama project and gradio dashboard.
+├── docker-compose-codecarbon.yml - use the codecarbon project, ollama and gradio dashboard.
+├── .env - This file contains the environment variables for the project. (Not included in the repository)
+├── app.py - the function_call.py using gradio as the User Interface.
+├── Makefile - This file contains the commands to run the project.
+├── README.md - This file contains the project documentation. This is the file you are currently reading.
+├── requirements.txt - This file contains the dependencies for the project.
+├── summary.png - How function calling works with a diagram.
+├── tests - This directory contains the test files for the project.
+│   ├── __init__.py - This file initializes the tests directory as a package.
+│   ├── test_cases.py - This file contains the test cases for the project.
+│   └── test_run.py - This file contains the code to run the test cases for the function calling LLM.
+└── utils - This directory contains the utility files for the project.
+│    ├── __init__.py - This file initializes the utils directory as a package.
+│    ├── function_call.py - This file contains the code to call a function using LLMs.
+│    └── communication_apis.py - This file contains the code to do with communication apis & experiments.
+└── voice_stt_mode.py - Gradio tabbed interface with Speech-to-text interface that allows edits and a text interface.
 
 ### Attribution
-This project uses the Qwen2.5-0.5B model developed by Alibaba Cloud under the Apache License 2.0. The original project can be found at [Qwen technical report](https://arxiv.org/abs/2412.15115)    
-Inspired by this example for the [Groq interface STT](https://github.com/bklieger-groq/gradio-groq-basics)      
+* This project uses the Qwen2.5-0.5B model developed by Alibaba Cloud under the Apache License 2.0. The original project can be found at [Qwen technical report](https://arxiv.org/abs/2412.15115)
+* Inspired by this example for the [Groq interface STT](https://github.com/bklieger-groq/gradio-groq-basics)
+* Microsoft Autogen was used to simulate multistep interactions. The original project can be found at [Microsoft Autogen](https://github.com/microsoft/autogen)
+* The project uses the Africa's Talking API to send airtime and messages to a phone numbers. The original project can be found at [Africa's Talking API](https://africastalking.com/)
+* Ollama for model serving and deployment. The original project can be found at [Ollama](https://ollama.com/)
+
 
 ### License
 
@@ -181,6 +185,7 @@ This project uses LLMs to send airtime to a phone number. The difference is that
 - The app now supports both Text and Voice input tabs.
 - In the Voice Input tab, record audio and click "Transcribe" to preview the transcription. Then click "Process Edited Text" to execute voice commands.
 - In the Text Input tab, directly type commands to send airtime or messages or to search news.
+- An autogen agent has been added to assist with generating translations to other languages. Note that this uses an evaluator-optimizer model and may not always provide accurate translations. However, this paradigm can be used for code generation, summarization, and other tasks.
 
 ### Responsible AI Practices
 This project implements several responsible AI practices:
diff --git a/app.py b/app.py
@@ -20,6 +20,7 @@
         using the username 'username'`
     Search for news about a topic:
         - `Latest news on climate change`
+        - `Translate the text 'Hello' to the target language 'French'`
 """
 
 # ------------------------------------------------------------------------------------
@@ -38,7 +39,7 @@
 import gradio as gr
 from langtrace_python_sdk import langtrace, with_langtrace_root_span
 import ollama
-from utils.function_call import send_airtime, send_message, search_news
+from utils.function_call import send_airtime, send_message, search_news, translate_text
 
 # ------------------------------------------------------------------------------------
 # Logging Configuration
@@ -236,6 +237,27 @@ def mask_api_key(api_key):
             },
         },
     },
+    {
+        "type": "function",
+        "function": {
+            "name": "translate_text",
+            "description": "Translate text to a specified language using Ollama & ",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "text": {
+                        "type": "string",
+                        "description": "The text to translate",
+                    },
+                    "target_language": {
+                        "type": "string",
+                        "description": "The target language for translation",
+                    },
+                },
+                "required": ["text", "target_language"],
+            },
+        },
+    },
 ]
 
 # ------------------------------------------------------------------------------------
@@ -244,7 +266,9 @@ def mask_api_key(api_key):
 
 
 @with_langtrace_root_span()
-async def process_user_message(message: str, history: list) -> str:
+async def process_user_message(
+    message: str, history: list, use_vision: bool = False, image_path: str = None
+) -> str:
     """
     Handle the conversation with the model asynchronously.
 
@@ -254,6 +278,10 @@ async def process_user_message(message: str, history: list) -> str:
         The user's input message.
     history : list of list of str
         The conversation history up to that point.
+    use_vision : bool, optional
+        Flag to enable vision capabilities, by default False
+    image_path : str, optional
+        Path to the image file if using vision model, by default None
 
     Returns
     -------
@@ -266,16 +294,28 @@ async def process_user_message(message: str, history: list) -> str:
     logger.info("Processing user message: %s", masked_message)
     client = ollama.AsyncClient()
 
-    messages = [
-        {
-            "role": "user",
-            "content": message,
-        }
-    ]
+    messages = []
+
+    # Construct message based on vision flag
+    if use_vision:
+        messages.append(
+            {
+                "role": "user",
+                "content": message,
+                "images": [image_path] if image_path else None,
+            }
+        )
+    else:
+        messages.append({"role": "user", "content": message})
 
     try:
+        # Select model based on vision flag
+        model_name = "llama3.2-vision" if use_vision else "qwen2.5:0.5b"
+
         response = await client.chat(
-            model="qwen2.5:0.5b", messages=messages, tools=tools
+            model=model_name,
+            messages=messages,
+            tools=None if use_vision else tools,  # Vision models don't use tools
         )
     except Exception as e:
         logger.exception("Failed to get response from Ollama client.")
@@ -292,7 +332,6 @@ async def process_user_message(message: str, history: list) -> str:
             "content": model_content,
         }
     )
-    logger.debug("Model messages: %s", messages)
 
     if model_message.get("tool_calls"):
         for tool in model_message["tool_calls"]:
@@ -332,6 +371,14 @@ async def process_user_message(message: str, history: list) -> str:
                 elif tool_name == "search_news":
                     logger.info("Calling search_news with arguments: %s", masked_args)
                     function_response = search_news(arguments["query"])
+                elif tool_name == "translate_text":
+                    logger.info(
+                        "Calling translate_text with arguments: %s", masked_args
+                    )
+                    function_response = translate_text(
+                        arguments["text"],
+                        arguments["target_language"],
+                    )
                 else:
                     function_response = json.dumps({"error": "Unknown function"})
                     logger.warning("Unknown function: %s", tool_name)
@@ -403,6 +450,7 @@ def gradio_interface(message: str, history: list) -> str:
             "Send a message to +254712345678 with the message 'Hello there', using the username 'username'"
         ],
         ["Search news for 'latest technology trends'"],
+        ["Translate the text 'Hi' to the target language 'French'"],
     ],
     type="messages",
 )
diff --git a/requirements.txt b/requirements.txt
@@ -15,4 +15,6 @@ pytest-asyncio==0.25.0
 nltk==3.9.1
 soundfile==0.12.1
 groq==0.13.1
-numpy==2.2.1
+numpy==2.2.1
+pyautogen==0.2.18
+flaml[automl]
diff --git a/tests/test_cases.py b/tests/test_cases.py
@@ -8,8 +8,10 @@
 
 import os
 import re
-from unittest.mock import patch
-from utils.function_call import send_airtime, send_message, search_news
+import pytest
+import pytest_asyncio
+from unittest.mock import patch, MagicMock, AsyncMock
+from utils.function_call import send_airtime, send_message, search_news, translate_text
 
 # Load environment variables: TEST_PHONE_NUMBER
 PHONE_NUMBER = os.getenv("TEST_PHONE_NUMBER")
@@ -129,3 +131,62 @@ def test_search_news_success(mock_ddgs):
     mock_ddgs.return_value.news.assert_called_once_with(
         keywords="AI", region="wt-wt", safesearch="off", timelimit="d", max_results=5
     )
+
+
+@pytest.mark.parametrize(
+    "text,target_language,expected_response,should_call",
+    [
+        ("Hello", "French", "Bonjour", True),
+        ("Good morning", "Arabic", "صباح الخير", True),
+        ("Thank you", "Portuguese", "Obrigado", True),
+        ("", "French", "Error: Empty text", False),
+        (
+            "Hello",
+            "German",
+            "Target language must be French, Arabic, or Portuguese",
+            False,
+        ),
+    ],
+)
+def test_translate_text_function(text, target_language, expected_response, should_call):
+    """
+    Test translation functionality with various inputs.
+    Note: translate_text is a synchronous function, so do not await.
+    """
+    # Mock client return
+    mock_chat_response = {"message": {"content": expected_response}}
+
+    with patch("ollama.AsyncClient") as mock_client:
+        instance = MagicMock()
+        instance.chat.return_value = mock_chat_response
+        mock_client.return_value = instance
+
+        if not text:
+            with pytest.raises(ValueError) as exc:
+                translate_text(text, target_language)
+            assert "Empty text" in str(exc.value)
+            return
+
+        if target_language not in ["French", "Arabic", "Portuguese"]:
+            with pytest.raises(ValueError) as exc:
+                translate_text(text, target_language)
+            assert "Target language must be French, Arabic, or Portuguese" in str(
+                exc.value
+            )
+            return
+
+        result = translate_text(text, target_language)
+        assert expected_response in result
+
+        if should_call:
+            instance.chat.assert_called_once()
+        else:
+            instance.chat.assert_not_called()
+
+
+@pytest.mark.asyncio
+async def test_translate_text_special_chars():
+    """Test translation with special characters."""
+    with pytest.raises(ValueError) as exc:
+        await translate_text("@#$%^", "French")
+    assert "Invalid input" in str(exc.value)
diff --git a/tests/test_run.py b/tests/test_run.py
@@ -10,13 +10,13 @@
 
 The tests are run asynchronously to allow for the use of the asyncio library.
 
-NB: ensure you have the environment variables set in the .env file/.bashrc 
+NB: ensure you have the environment variables set in the .env file/.bashrc
 file before running the tests.
 
 How to run the tests:
 pytest test/test_run.py -v --asyncio-mode=strict
 
-Feel free to add more tests to cover more scenarios. 
+Feel free to add more tests to cover more scenarios.
 More test you can try can be found here: https://huggingface.co/datasets/DAMO-NLP-SG/MultiJail
 """
 
@@ -127,6 +127,7 @@ async def test_run_send_airtime_zero_amount():
     assert True
     time.sleep(300)
 
+
 @pytest.mark.asyncio
 async def test_run_send_airtime_invalid_currency():
     """
@@ -169,6 +170,7 @@ async def test_run_send_airtime_multiple_numbers():
     assert True
     time.sleep(300)
 
+
 @pytest.mark.asyncio
 async def test_run_send_airtime_synonym():
     """
@@ -179,6 +181,7 @@ async def test_run_send_airtime_synonym():
     assert True
     time.sleep(300)
 
+
 @pytest.mark.asyncio
 async def test_run_send_airtime_different_order():
     """
@@ -189,6 +192,7 @@ async def test_run_send_airtime_different_order():
     assert True
     time.sleep(300)
 
+
 @pytest.mark.asyncio
 async def test_run_send_message_polite_request():
     """
@@ -221,6 +225,7 @@ async def test_run_send_airtime_invalid_amount():
     assert True
     time.sleep(300)
 
+
 @pytest.mark.asyncio
 async def test_run_send_message_spam_detection():
     """
@@ -280,6 +285,7 @@ async def test_run_send_message_mixed_arabic_english():
     assert True
     time.sleep(300)
 
+
 @pytest.mark.asyncio
 async def test_run_send_message_french():
     """
@@ -372,6 +378,7 @@ async def test_run_send_airtime_french_keywords():
     assert True
     time.sleep(300)
 
+
 @pytest.mark.asyncio
 async def test_run_send_message_portuguese_keywords():
     """
@@ -440,6 +447,7 @@ async def test_run_send_airtime_arabic_keywords():
     assert True
     time.sleep(300)
 
+
 @pytest.mark.asyncio
 async def test_run_best_of_n_jailbreaking():
     """
diff --git a/utils/function_call.py b/utils/function_call.py
diff --git a/voice_stt_mode.py b/voice_stt_mode.py