This document guides you through deploying the CodeGen application on a Kubernetes cluster using the OPEA Microservices Connector (GMC).
To deploy the multi-component CodeGen application on Kubernetes, leveraging GMC to manage the connections and routing between microservices based on a declarative configuration file.
- A running Kubernetes cluster.
kubectl
installed and configured to interact with your cluster.- GenAI Microservice Connector (GMC) installed in your cluster. Follow the GMC installation guide if you haven't already.
- Access to the container images specified in the GMC configuration files (
codegen_xeon.yaml
orcodegen_gaudi.yaml
). These may be on Docker Hub or a private registry.
Two GMC configuration files are provided based on the target hardware for the LLM serving component:
codegen_xeon.yaml
: Deploys CodeGen using CPU-optimized components (suitable for Intel Xeon clusters).codegen_gaudi.yaml
: Deploys CodeGen using Gaudi-optimized LLM serving components (suitable for clusters with Intel Gaudi nodes).
Select the file appropriate for your cluster hardware. The following steps use codegen_xeon.yaml
as an example.
Choose a namespace for the deployment (e.g., codegen-app
).
# Set the desired namespace
export APP_NAMESPACE=codegen-app
# Create the namespace if it doesn't exist
kubectl create ns $APP_NAMESPACE || true
# (Optional) Update the namespace within the chosen YAML file if it's not parameterized
# sed -i "s|namespace: codegen|namespace: $APP_NAMESPACE|g" ./codegen_xeon.yaml
# Apply the GMC configuration file to the chosen namespace
kubectl apply -f ./codegen_xeon.yaml -n $APP_NAMESPACE
Note: If the YAML file uses a hardcoded namespace, ensure you either modify the file or deploy to that specific namespace.
Check that all the pods defined in the GMC configuration are successfully created and running.
kubectl get pods -n $APP_NAMESPACE
Wait until all pods are in the Running
state. This might take some time for image pulling and initialization.
Deploy a simple pod within the same namespace to use as a client for sending requests.
kubectl create deployment client-test -n $APP_NAMESPACE --image=curlimages/curl -- sleep infinity
Verify the client pod is running:
kubectl get pods -n $APP_NAMESPACE -l app=client-test
Retrieve the access URL exposed by the GMC for the CodeGen application.
# Get the client pod name
export CLIENT_POD=$(kubectl get pod -n $APP_NAMESPACE -l app=client-test -o jsonpath='{.items[0].metadata.name}')
# Get the access URL provided by the 'codegen' GMC resource
# Adjust 'codegen' if the metadata.name in your YAML is different
export ACCESS_URL=$(kubectl get gmc codegen -n $APP_NAMESPACE -o jsonpath='{.status.accessUrl}')
# Display the URL (optional)
echo "Access URL: $ACCESS_URL"
Note: The accessUrl
typically points to the internal Kubernetes service endpoint for the gateway service defined in the GMC configuration.
Use the test client pod to send a curl
request to the CodeGen service endpoint.
# Define the payload
PAYLOAD='{"messages": "def print_hello_world():"}'
# Execute curl from the client pod
kubectl exec "$CLIENT_POD" -n $APP_NAMESPACE -- curl -s --no-buffer "$ACCESS_URL" \
-X POST \
-d "$PAYLOAD" \
-H 'Content-Type: application/json'
Expected Output: A stream of JSON data containing the generated code, similar to the Docker Compose validation, ending with a [DONE]
marker if streaming is enabled.
To remove the deployed application and the test client:
# Delete the GMC deployment
kubectl delete -f ./codegen_xeon.yaml -n $APP_NAMESPACE
# Delete the test client deployment
kubectl delete deployment client-test -n $APP_NAMESPACE
# Optionally delete the namespace if no longer needed
# kubectl delete ns $APP_NAMESPACE