nvidia-k8s-device-plugin

更新时间:2025年3月21日 00:22 浏览:833

官方网址:

https://github.com/NVIDIA/k8s-device-plugin

 

helm 安装

helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
helm repo update

helm search repo nvdp --devel

# 安装,启用 GFD
helm upgrade -i nvdp nvdp/nvidia-device-plugin \
  --namespace nvidia-device-plugin \
  --create-namespace \
  --version 0.15.0 \
  --set gfd.enabled=true

# 卸载
helm uninstall nvdp --namespace nvidia-device-plugin

 

增加参数示例:

helm upgrade -i nvdp nvdp/nvidia-device-plugin \
  --namespace nvidia-device-plugin \
  --create-namespace \
  --version 0.15.0 \
  --set gfd.enabled=true \
  --set compatWithCPUManager=true

 

 

NFD 镜像拉不下来处理

ctr -n k8s.io image pull m.daocloud.io/registry.k8s.io/nfd/node-feature-discovery:v0.15.3
ctr -n k8s.io image tag m.daocloud.io/registry.k8s.io/nfd/node-feature-discovery:v0.15.3 registry.k8s.io/nfd/node-feature-discovery:v0.15.3

版本号跟据实际需要调整

 

测试是否部署成功

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  restartPolicy: Never
  containers:
    - name: cuda-container
      image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda10.2
      resources:
        limits:
          nvidia.com/gpu: 1 # requesting 1 GPU
  tolerations:
  - key: nvidia.com/gpu
    operator: Exists
    effect: NoSchedule
EOF

# kubectl apply -f  https://nas.liu12.com:8443/k8s/nvdp/sample.yaml

kubectl logs gpu-pod

 

 

 

 

导航