GPU Scheduling: Taints, Extended Resources & MIG
GPU pods request nvidia.com/gpu (or MIG-specific resources). Isolate GPU nodes with taints and matching tolerations. Use node selectors or GFD labels for hardware profiles. Enable MIG only when the operator partitions cards and the device plugin exposes slice resources—otherwise schedule whole GPUs only.
Extended resources
After the device plugin registers GPUs, nodes show allocatable capacity:
resources:
limits:
nvidia.com/gpu: 1 # Whole GPU — scheduler counts integer GPUs
requests:
nvidia.com/gpu: 1nvidia.com/gpu must match and be integers unless using time-slicing configs that explicitly allow sharing.Taints & tolerations
Prevent generic workloads from landing on expensive GPU nodes—pair with scheduling & taints patterns.
# Node (often applied by Karpenter NodePool or MNG launch template)
spec:
taints:
- key: sku
value: gpu
effect: NoSchedule
---
# Pod
spec:
tolerations:
- key: sku
operator: Equal
value: gpu
effect: NoSchedule
nodeSelector:
nvidia.com/gpu.present: "true" # From GPU Feature DiscoveryScheduling flow
Figure 1 — Scheduler must pass taint/toleration gates before considering nvidia.com/gpu free capacity.
Multi-Instance GPU (MIG)
MIG splits one physical GPU into isolated instances. Requires MIG Manager + configured profiles on the node. Pods request MIG-specific resources (names depend on profile), e.g.:
resources:
limits:
nvidia.com/mig-1g.5gb: 1 # Example — verify allocatable keys on your node
requests:
nvidia.com/mig-1g.5gb: 1| Mode | Pros | Cons |
|---|---|---|
| Whole GPU | Simple; max performance | Low utilization for small models |
| MIG | Hard isolation between tenants | Ops overhead; not all SKUs support MIG |
| Time-slicing | Share one GPU among many pods | No memory isolation—noisy neighbor risk |
Scheduling failures
| Event | Fix |
|---|---|
Insufficient nvidia.com/gpu | Scale GPU pool via Karpenter or reduce requests |
Did not tolerate taint sku=gpu | Add toleration or remove stray taint |
| 0 allocatable GPU on node | Fix device plugin / driver |