nvidia-k8s-device-plugin
更新时间:2025年3月21日 00:22
浏览:833
官方网址:
https://github.com/NVIDIA/k8s-device-plugin
helm 安装
helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
helm repo update
helm search repo nvdp --devel
# 安装,启用 GFD
helm upgrade -i nvdp nvdp/nvidia-device-plugin \
--namespace nvidia-device-plugin \
--create-namespace \
--version 0.15.0 \
--set gfd.enabled=true
# 卸载
helm uninstall nvdp --namespace nvidia-device-plugin
增加参数示例:
helm upgrade -i nvdp nvdp/nvidia-device-plugin \
--namespace nvidia-device-plugin \
--create-namespace \
--version 0.15.0 \
--set gfd.enabled=true \
--set compatWithCPUManager=true
NFD 镜像拉不下来处理
ctr -n k8s.io image pull m.daocloud.io/registry.k8s.io/nfd/node-feature-discovery:v0.15.3
ctr -n k8s.io image tag m.daocloud.io/registry.k8s.io/nfd/node-feature-discovery:v0.15.3 registry.k8s.io/nfd/node-feature-discovery:v0.15.3
版本号跟据实际需要调整
测试是否部署成功
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
spec:
restartPolicy: Never
containers:
- name: cuda-container
image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda10.2
resources:
limits:
nvidia.com/gpu: 1 # requesting 1 GPU
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
EOF
# kubectl apply -f https://nas.liu12.com:8443/k8s/nvdp/sample.yaml
kubectl logs gpu-pod