Skip to content

ShayneP/local-voice-ai

Repository files navigation

App Icon

🧠 Local Voice Agent

A full-stack, Dockerized AI voice assistant with speech, text, and voice synthesis powered by LiveKit.

demo-video.mp4

🧩 Overview

This repo contains everything needed to run a real-time AI voice assistant locally using:

  • 🎙️ LiveKit Agents for STT ↔ LLM ↔ TTS
  • 🧠 Ollama for running local LLMs
  • 🗣️ Kokoro for TTS voice synthesis
  • 👂 Whisper (via VoxBox) for speech-to-text
  • 🔍 RAG powered by Sentence Transformers and FAISS
  • 💬 Next.js + Tailwind frontend UI
  • 🐳 Fully containerized via Docker Compose

🏁 Quick Start

./test.sh

This script:

  • Cleans up existing containers
  • Builds all services
  • Launches the full stack (agent, LLM, STT, TTS, frontend, and signaling server)

Once it's up, visit http://localhost:3000 in your browser to start chatting.

📦 Architecture

Each service is containerized and communicates over a shared Docker network:

  • livekit: WebRTC signaling server
  • agent: Custom Python agent with LiveKit SDK
  • whisper: Speech-to-text using vox-box and Whisper model
  • ollama: Local LLM provider (e.g., gemma3:4b)
  • kokoro: TTS engine for speaking responses
  • frontend: React-based client using LiveKit components

🧠 Agent Instructions

Your agent lives in agent/myagent.py. It uses:

  • openai.STT → routes to Whisper
  • openai.LLM → routes to Ollama
  • groq.TTS → routes to Kokoro
  • silero.VAD → for voice activity detection
  • SentenceTransformer → embeds documents and queries for RAG
  • FAISS → performs similarity search for knowledge retrieval

The agent supports Retrieval-Augmented Generation (RAG) by loading documents from the agent/docs directory. These documents are embedded using the all-MiniLM-L6-v2 model and indexed using FAISS for fast similarity search. During conversations, relevant document snippets are automatically retrieved to enhance the agent's responses.

All metrics from each component are logged for debugging.

🔐 Environment Variables

You can find environment examples in:

These provide keys and internal URLs for each service. Most keys are placeholders for local dev use.

🧪 Testing & Dev

To test or redeploy:

docker-compose down -v --remove-orphans
docker-compose up --build

The services will restart and build fresh containers.

🧰 Project Structure

.
├── agent/                     # Python voice agent
├── ollama/                    # LLM serving
├── whisper/                   # Whisper via vox-box
├── livekit/                   # Signaling server
├── voice-assistant-frontend/ # Next.js UI client
└── docker-compose.yml         # Brings it all together

📷 Screenshots

UI Screenshot

🛠️ Requirements

  • Docker + Docker Compose
  • No GPU required (uses CPU-based models)
  • Recommended RAM: 12GB+

🙌 Credits

About

Local voice AI powered by Ollama, Kokoro, Whisper, and LiveKit.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published