Skip to content

samuelvl/containerd-shim-ollama

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

containerd-shim-ollama

Deploy a containerd runtime shim for serving AI models with Ollama.

Setup

Create a kind cluster with the ollama shim:

make kind

Deploy your model using the ollama-shim runtime class:

kubectl create namespace ai-models
kubectl apply -n ai-models -f ./manifests/models/qwen2-model.yaml

Verify that the model is running:

kubectl wait --for=condition=available -n ai-models deployment/qwen2 --timeout=1m

Port-forward the model service to your local machine:

kubectl port-forward -n ai-models svc/qwen2 8080:80

Ask some questions to the model from your local machine (in a new terminal):

curl http://localhost:8080/api/generate -d '{
    "model": "qwen2:latest",
    "prompt": "What is the Kubecon?",
    "stream": false
}' | jq -r '.response'

Troubleshooting

Connect to the kind node:

docker exec -it ollama-shim-control-plane bash

Inspect the logs of the containerd runtime to see the ollama shim in action:

journalctl -f -u containerd

Find the model from the image snapshot:

find /var/lib/containerd/ -name '*.gguf'

Start the model locally on the node:

ollama runner --port 8080 --ctx-size 8192 --model ${model}

Clean-up

Delete the cluster to clean everything up:

make clean

Links

About

Kubernetes runtime for AI models

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors 3

  •  
  •  
  •