
A full-stack, Dockerized AI voice assistant with speech, text, and voice synthesis powered by LiveKit.
demo-video.mp4
This repo contains everything needed to run a real-time AI voice assistant locally using:
- 🎙️ LiveKit Agents for STT ↔ LLM ↔ TTS
- 🧠 Ollama for running local LLMs
- 🗣️ Kokoro for TTS voice synthesis
- 👂 Whisper (via VoxBox) for speech-to-text
- 🔍 RAG powered by Sentence Transformers and FAISS
- 💬 Next.js + Tailwind frontend UI
- 🐳 Fully containerized via Docker Compose
./test.sh
This script:
- Cleans up existing containers
- Builds all services
- Launches the full stack (agent, LLM, STT, TTS, frontend, and signaling server)
Once it's up, visit http://localhost:3000 in your browser to start chatting.
Each service is containerized and communicates over a shared Docker network:
livekit
: WebRTC signaling serveragent
: Custom Python agent with LiveKit SDKwhisper
: Speech-to-text usingvox-box
and Whisper modelollama
: Local LLM provider (e.g.,gemma3:4b
)kokoro
: TTS engine for speaking responsesfrontend
: React-based client using LiveKit components
Your agent lives in agent/myagent.py
. It uses:
openai.STT
→ routes to Whisperopenai.LLM
→ routes to Ollamagroq.TTS
→ routes to Kokorosilero.VAD
→ for voice activity detectionSentenceTransformer
→ embeds documents and queries for RAGFAISS
→ performs similarity search for knowledge retrieval
The agent supports Retrieval-Augmented Generation (RAG) by loading documents from the agent/docs
directory. These documents are embedded using the all-MiniLM-L6-v2 model and indexed using FAISS for fast similarity search. During conversations, relevant document snippets are automatically retrieved to enhance the agent's responses.
All metrics from each component are logged for debugging.
You can find environment examples in:
These provide keys and internal URLs for each service. Most keys are placeholders for local dev use.
To test or redeploy:
docker-compose down -v --remove-orphans
docker-compose up --build
The services will restart and build fresh containers.
.
├── agent/ # Python voice agent
├── ollama/ # LLM serving
├── whisper/ # Whisper via vox-box
├── livekit/ # Signaling server
├── voice-assistant-frontend/ # Next.js UI client
└── docker-compose.yml # Brings it all together
- Docker + Docker Compose
- No GPU required (uses CPU-based models)
- Recommended RAM: 12GB+
- Built with ❤️ by LiveKit
- Uses LiveKit Agents
- Local LLMs via Ollama
- TTS via Kokoro