KubeSage is an AI-driven Kubernetes troubleshooting assistant that integrates LangChain Agents with Kubernetes APIs. It provides real-time diagnostics, resource monitoring, and troubleshooting recommendations for Kubernetes clusters using OpenAI's GPT-4o.
- Features
- Installation
- Usage
- Available Tools
- Architecture
- WebSocket Integration
- Troubleshooting
- License
✅ AI-powered Kubernetes troubleshooting
✅ LangChain Agents for intelligent decision-making
✅ Real-time Kubernetes monitoring (Pods, Deployments, Services, Nodes)
✅ Deep dive diagnostics (logs, resource usage, RBAC, etc.)
✅ WebSocket interface for interactive debugging
✅ Secure authentication using OpenAI API keys
git clone https://github.com/your-username/KubeSage.git
cd KubeSage
conda create -n kube-sage python=3.9 -y
conda activate kube-sage
pip install -r requirements.txt
Ensure your Kubernetes cluster is accessible:
- Inside Cluster: Automatically loads service account credentials.
- Outside Cluster: Set up
KUBECONFIG
:export KUBECONFIG=$HOME/.kube/config
python src/main.py
Use wscat
or a WebSocket client:
wscat -c ws://localhost:6000/ws
You can then start chatting with the AI assistant.
Tool | Description |
---|---|
Get All Pods with Resource Usage |
Lists all pods with CPU & memory usage. |
Get All Services |
Lists all services and their types/ports. |
Get All Deployments |
Fetches deployment details. |
Get All Nodes |
Lists nodes with health & capacity. |
Get Cluster Events |
Shows recent warnings & failures. |
Get Namespace List |
Fetches all Kubernetes namespaces. |
Tool | Description |
---|---|
Describe Pod with Restart Count |
Fetches pod details + restart count. |
Get Pod Logs |
Retrieves last 10 log lines for a pod. |
Describe Service |
Gets details of a Kubernetes service. |
Describe Deployment |
Fetches deployment details (replica count, images). |
Check RBAC Events & Role Bindings |
Analyzes security permissions. |
Get Ingress Resources |
Lists ingress rules, hosts & annotations. |
Check Pod Affinity & Anti-Affinity |
Analyzes scheduling constraints. |
(Replace with actual architecture diagram if available)
1️⃣ FastAPI WebSocket Server - Handles real-time interactions.
2️⃣ LangChain Agent - Uses OpenAI GPT-4o to select appropriate tools.
3️⃣ Kubernetes API Client - Fetches cluster insights and diagnostics.
4️⃣ RBAC & Authentication - Secure access to cluster resources.
KubeSage uses WebSockets for real-time AI troubleshooting.
Example connection using Python WebSockets:
import websockets
import asyncio
async def connect():
uri = "ws://localhost:6000/ws"
async with websockets.connect(uri) as websocket:
await websocket.send("Describe Pod with Restart Count")
response = await websocket.recv()
print(f"Response: {response}")
asyncio.run(connect())
✅ Ensure WebSocket server is running:
python src/main.py
✅ Verify RBAC permissions:
kubectl auth can-i get pods --as=system:serviceaccount:default:kubesage-sa
If denied, apply:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: metrics-reader-binding
subjects:
- kind: ServiceAccount
name: kubesage-sa
namespace: default
roleRef:
kind: ClusterRole
name: metrics-reader
apiGroup: rbac.authorization.k8s.io
✅ Enable Metrics Server:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
MIT License - Free to use, modify, and distribute.
Contributions welcome!
To-Do List
- Add Prometheus/Grafana integration for advanced monitoring
- Support multi-cluster troubleshooting
- Add more AI-powered insights
Want to contribute? Open a PR!
Thanks to Kubernetes, FastAPI, LangChain, and OpenAI for making AI-driven DevOps possible!