How to enable NVIDIA GPU workloads on k3s cluster

Share
 
GPU workloads on k3s
GPU workloads on k3s

K3s is a highly available, certified Kubernetes distribution designed for production workloads. It can also be used for AI workloads.

 By default, k3s nodes do not recognize GPUs. In this article, we will enable k3s to work with a GPU.

 

 

 

 

 Step 1 : Install NVIDIA drivers

# Ubuntu / Debian
sudo apt-get update
sudo apt-get install -y nvidia-driver-535
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart containerd

Verify Docker/Containerd GPU access:

nvidia-container-cli info

 Step 2 : Install k3s

 k3s uses containerd, so we must enable the NVIDIA runtime. 

Create config: 

sudo mkdir -p /etc/rancher/k3s
sudo nano /etc/rancher/k3s/config.yaml

Inside the config file you need to add the following:

write-kubeconfig-mode: "0644"
container-runtime-endpoint: ""

 Now install k3s : (the below command can also be tweaked further, more details on k3s installation page)

curl -sfL https://get.k3s.io | sh -

 Restart k3s:

sudo systemctl restart k3s

  Step 3 : Install helm

 Official Helm repository:

curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

Point Helm explicitly to k3s kubeconfig

export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
sudo chmod 644 /etc/rancher/k3s/k3s.yaml 


  Step 4 : Install NVIDIA GPU Operator

helm install gpu-operator nvidia/gpu-operator \
--namespace gpu-operator \
--set driver.enabled=false


After some time, you need to create a test yaml file called cuda-test.yaml 

apiVersion: v1
kind: Pod
metadata:
name: cuda-test
spec:
restartPolicy: Never
containers:
- name: cuda
image: nvidia/cuda:12.2.0-base-ubuntu22.04
resources:
limits:
nvidia.com/gpu: 1
command:
- bash
- -c
- |
echo "=== NVIDIA-SMI OUTPUT ==="
nvidia-smi
echo "========================="
sleep 3600


Use below command to spawn a new test pod:

kubectl apply -f cuda-test.yaml
kubectl get pods

Now if everything is successful, you should see something like this;

 

NVIDIA SMI

You can now use the node for GPU workloads.



0 comments:

Post a Comment

What do you think?.

© 2007 - DMCA.com Protection Status
The content is copyrighted to Sundeep Machado


Note: The author is not responsible for damages related to improper use of software, techniques, tips and copyright claims.