NVIDIA Cloud Native
docs, youtube, youtube developer
Graphical cards
| Name | Specifications | Comments |
|---|---|---|
| T4 | ||
| T400 | Affordable | |
| V100 | ||
| A100 |
GPUs on Kubernetes
NVIDIA Multi-Instance GPU (MIG)
Multi-Instance GPU (MIG) expands the performance and value of NVIDIA H100, A100, and A30 Tensor Core GPUs. MIG can partition the GPU into as many as seven instances, each fully isolated with its own high-bandwidth memory, cache, and compute cores.
→ product
NVIDIA GPU Cloud (NGC)
Components
NVIDIA Container Runtime
→ code
NVIDIA Container Toolkit
→ code
NVIDIA DCGM-Exporter
DCGM-Exporter exposes GPU metrics exporter for Prometheus leveraging NVIDIA Data Center GPU Manager (DCGM)
NVIDIA GPU feature discovery
→ code
NVIDIA GPU Operator
NVIDIA Device Plugin
→ code
Tutorials
NVIDIA GPU Operator in K3s
- Installing & Using the NVIDIA GPU Operator in K3s with Rancher by Virtual Thoughts - November 21, 2022
NVIDIA GPUs with SLES
ℹ Full official support should come in early 2023
- Build and push a driver image for SLES (from GitLab project)
git clone https://gitlab.com/nvidia/container-images/driver.git && cd sle15
docker build . -t path/to/your/repo/driver:515.65.01-sles15.3 \
--build-arg DRIVER_VERSION=515.65.01 \
--build-arg CUDA_VERSION=11.7.1 \
--build-arg SLES_VERSION=15.3
docker push path/to/your/repo/driver:515.65.01-sles15.3
- Deploy GPU Operator and specify custom driver image