If you're worried about privacy and security, you can run your own local AI infrastructure with this repository.
Imagine running ChatGPT, Stable Diffusion, Cursor (or VSCode) chat/autocomplete, and more, locally.
It includes the following basic services:
- Ollama as the LLM provider
- Open WebUI, a browser-based AI platform
- ComfyUI, for image generation, integrated with Open WebUI
- ngrok, enable internet access and Cursor integration
It's also possible to integrate this with Cursor or VSCode, via continue.dev.
This was built with focus on NVIDIA GPUs, using other GPUs is out of scope, but it's possible by tweaking the docker-compose file.
If you want to expose your services to the internet, you can use ngrok. This is not required, so if you don't want to use it, you can just remove the ngrok
and nginx
services from the docker-compose file.
Start the services with docker compose. This will take quite a while, mostly due to comfyui
image. If are interested in only testing it, you can skip the comfyui
service (comment it out).
Create the external volumes for the services that need them:
docker volume create ollama
docker volume create open-webui
docker volume create comfyui-checkpoints
docker volume create comfyui-vae
and then start the services:
docker compose up -d
To get started, you need to download the models from the ollama models page.
You can do that by running the following command:
docker compose exec -it ollama ollama pull llama3.1:8b
To verify that the model was downloaded correctly, run:
docker compose exec -it ollama ollama list
Open WebUI documentation is available here.
The service can be accessed at localhost:3002 and if you open for the first time, you'll need to create an account.
There isn't much to setup so you can create a chat and start using it right away.
ComfyUI documentation is available here.
The service can be accessed at localhost:8188/ however you still need to configure it.
A guide on how to configure Open WebUI to use ComfyUI is available here.
To integrate with VSCode, you can use the continue.dev extension.
Once installed, you need to point it to the local ollama instance. Detailed documentation is available here, but in short, you need to add this to the extension's config.json
file:
{
"models": [
{
"title": "llama3.1:8b",
"provider": "ollama",
"model": "llama3.1:8b",
"apiBase": "http://localhost:11434"
}
]
}
It's possible to point Cursor to use the local ollama instance. This will work for the "chat" and "composer", but not for the autocomplete features.
As the time of the writing, Cursor servers need to reach out to your local machine, so you need ngrok
and curxy
running.
curxy
is a very simple proxy that fixes some of the issues with the way Cursor makes requests to the API by pretending that it's an OpenAI API endpoint.
Create an auth token and optionally a domain (since every time ngrok starts, it will generate a new domain and you'll need to reconfigure Cursor).
Copy the ngrok.example.yml
and configure the agent.authtoken
and the endpoints.web.url
to point to your ngrok's domain.
cp config/ngrok/ngrok.example.yml config/ngrok/ngrok.yml
Then, configure cursor by following these steps:
- Cursor Settings > Models > Model Names > Add
llama3.1:8b
as model to Cursor - Cursor Settings > OpenAI API Key > Enable
- Cursor Settings > OpenAI API Key > Override OpenAI Base URL > Add
https://your-ngrok-domain.ngrok-free.app/curxy/v1
> Save - Cursor Settings > OpenAI API Key > Add arbitrary string as OpenAI API key > Verify
- Use
llama3.1:8b
as Chat Model
Thanks to ryoppippi/curxy for the Cursor/Ollama proxy. I'm currently using a copy of it due to this issue.