Skip to content

Commit 8f7a0f3

Browse files
committed
update
1 parent 827e7c1 commit 8f7a0f3

File tree

2 files changed

+17
-72
lines changed

2 files changed

+17
-72
lines changed

Dockerfile

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,12 @@
1111
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License.
14+
FROM golang:1.21-bullseye AS GOBUILD
15+
ADD . /device-plugin
16+
ARG GOPROXY=https://goproxy.cn,direct
17+
RUN apt-get update && apt-get -y install libhwloc-dev libdrm-dev
18+
RUN cd /device-plugin && go build -o ./k8s-device-plugin cmd/k8s-device-plugin/main.go
19+
1420
FROM ubuntu:20.04
1521
ENV TZ=Asia/Dubai
1622
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
@@ -23,5 +29,5 @@ ENV DTKROOT=/opt/hygondriver
2329
ENV CPLUS_INCLUDE_PATH=/opt/hygondriver/include:/opt/hyhal/include:/opt/hygondriver/llvm/include:/opt/hygondriver/.hyhal/include:
2430
ENV HYHAL_PATH=/opt/hyhal
2531
WORKDIR /root/
26-
COPY cmd/k8s-device-plugin/k8s-device-plugin .
32+
COPY --from=GOBUILD /device-plugin/k8s-device-plugin .
2733
CMD ["./k8s-device-plugin", "-logtostderr=true", "-stderrthreshold=INFO", "-v=5"]

README.md

Lines changed: 10 additions & 71 deletions
Original file line numberDiff line numberDiff line change
@@ -1,89 +1,28 @@
1-
# AMD GPU device plugin for Kubernetes
2-
[![Go Report Card](https://goreportcard.com/badge/github.com/RadeonOpenCompute/k8s-device-plugin)](https://goreportcard.com/report/github.com/RadeonOpenCompute/k8s-device-plugin)
1+
# DCU vGPU device plugin for HAMi
32

43
## Introduction
5-
This is a [Kubernetes][k8s] [device plugin][dp] implementation that enables the registration of AMD GPU in a container cluster for compute workload. With the approrpriate hardware and this plugin deployed in your Kubernetes cluster, you will be able to run jobs that require AMD GPU.
6-
7-
More information about [RadeonOpenCompute (ROCm)][rocm]
4+
This is a [Kubernetes][k8s] [device plugin][dp] implementation that enables the registration of hygon DCU in a container cluster for compute workload. With the approrpriate hardware and this plugin deployed in your Kubernetes cluster, you will be able to run jobs that require AMD DCU. It supports DCU-virtualzation by using hy-virtual provided by dtk
85

96

107
## Prerequisites
11-
* [ROCm capable machines][sysreq]
12-
* [kubeadm capable machines][kubeadm] (if you are using kubeadm to deploy your k8s cluster)
13-
* [ROCm kernel][rock] ([Installation guide][rocminstall]) or latest AMD GPU Linux driver ([Installation guide][amdgpuinstall])
14-
* A [Kubernetes deployment][k8sinstall]
15-
* `--allow-privileged=true` for both kube-apiserver and kubelet (only needed if the device plugin is deployed via DaemonSet since the device plugin container requires privileged security context to access `/dev/kfd` for device health check)
8+
* dtk >= 24.04
9+
* hy=smi == v1.6.0
1610

1711

1812
## Limitations
1913
* This plugin targets Kubernetes v1.18+.
2014

2115
## Deployment
22-
The device plugin needs to be run on all the nodes that are equipped with AMD GPU. The simplist way of doing so is to create a Kubernetes [DaemonSet][ds], which run a copy of a pod on all (or some) Nodes in the cluster. We have a pre-built Docker image on [DockerHub][dhk8samdgpudp] that you can use for with your DaemonSet. This repository also have a pre-defined yaml file named `k8s-ds-amdgpu-dp.yaml`. You can create a DaemonSet in your Kubernetes cluster by running this command:
23-
```
24-
$ kubectl create -f k8s-ds-amdgpu-dp.yaml
2516
```
26-
or directly pull from the web using
17+
$ kubectl apply -f k8s-dcu-rbac.yaml
18+
$ kubectl apply -f k8s-dcu-plugin.yaml
2719
```
28-
kubectl create -f https://raw.githubusercontent.com/RadeonOpenCompute/k8s-device-plugin/master/k8s-ds-amdgpu-dp.yaml
29-
```
30-
31-
If you want to enable the experimental device health check, please use `k8s-ds-amdgpu-dp-health.yaml` **after** `--allow-privileged=true` is set for kube-apiserver and kublet.
3220

33-
## Example workload
34-
You can restrict work to a node with GPU by adding `resources.limits` to the pod definition. An example pod definition is provided in `example/pod/alexnet-gpu.yaml`. This pod runs the timing benchmark for AlexNet on AMD GPU and then go to sleep. You can create the pod by running:
21+
## Build
3522
```
36-
$ kubectl create -f alexnet-gpu.yaml
37-
```
38-
39-
or
40-
41-
```
42-
$ kubectl create -f https://raw.githubusercontent.com/RadeonOpenCompute/k8s-device-plugin/master/example/pod/alexnet-gpu.yaml
43-
```
44-
45-
and then check the pod status by running
46-
```
47-
$ kubectl describe pods
48-
```
49-
50-
After the pod is created and running, you can see the benchmark result by running:
23+
docker build .
5124
```
52-
$ kubectl logs alexnet-tf-gpu-pod alexnet-tf-gpu-container
53-
```
54-
55-
For comparison, an example pod definition of running the same benchmark with CPU is provided in `example/pod/alexnet-cpu.yaml`.
56-
57-
## Labelling node with additional GPU properties
58-
59-
Please see [AMD GPU Kubernetes Node Labeller](cmd/k8s-node-labeller/README.md) for details. An example configuration is in [k8s-ds-amdgpu-labeller.yaml](k8s-ds-amdgpu-labeller.yaml):
60-
```
61-
$ kubectl create -f k8s-ds-amdgpu-labeller.yaml
62-
```
63-
64-
or
65-
66-
```
67-
$ kubectl create -f https://raw.githubusercontent.com/RadeonOpenCompute/k8s-device-plugin/master/k8s-ds-amdgpu-labeller.yaml
68-
```
69-
70-
71-
## Notes
72-
* This plugin uses [`go modules`][gm] for dependencies management
73-
* Please consult the `Dockerfile` on how to build and use this plugin independent of a docker image
7425

75-
## TODOs
76-
* Add proper GPU health check (health check without `/dev/kfd` access.)
26+
## Maintainer
7727

78-
[ds]: https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/
79-
[dp]: https://kubernetes.io/docs/concepts/cluster-administration/device-plugins/
80-
[rocm]: https://rocm.github.io/
81-
[rock]: https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver
82-
[rocminstall]: http://rocm-documentation.readthedocs.io/en/latest/Installation_Guide/ROCk-kernel.html#rock-kernel
83-
[amdgpuinstall]: https://support.amd.com/en-us/kb-articles/Pages/AMDGPU-PRO-Install.aspx
84-
[sysreq]: http://rocm-documentation.readthedocs.io/en/latest/Installation_Guide/Installation-Guide.html#system-requirement
85-
[gm]: https://blog.golang.org/using-go-modules
86-
[kubeadm]: https://kubernetes.io/docs/setup/independent/install-kubeadm/#before-you-begin
87-
[k8sinstall]: https://kubernetes.io/docs/setup/independent/install-kubeadm
88-
[k8s]: https://kubernetes.io
89-
[dhk8samdgpudp]: https://hub.docker.com/r/rocm/k8s-device-plugin/
28+
limengxuan@4paradigm.com

0 commit comments

Comments
 (0)