TL;DR

A PersistentVolumeClaim asks for storage, a StorageClass defines how dynamic storage is provisioned, and a PersistentVolume is the actual Kubernetes object bound to the claim. Most storage incidents come from wrong StorageClass, unavailable CSI driver, access mode mismatch, zone binding, attach/mount failures, or misunderstood reclaim policy.

Mental Model

Kubernetes separates what the app asks for from how the platform provides it. The app uses a PVC. The platform defines StorageClasses. A CSI driver talks to the cloud, SAN, NAS, or software-defined storage backend. Once bound, the Pod mounts the PVC and sees a filesystem or block device.

Storage is one of the places where careless deletion can become data loss. Before deleting PVCs, PVs, StatefulSets, or StorageClasses, understand reclaim policy, snapshots/backups, and whether the backend volume is still needed.

Podmounts claimPVCrequestPVbound volumeCSI DriverBackend DiskPVC is the app request; PV is the bound asset; CSI handles provisioning, attach, mount, resize, and delete.

Kubernetes storage binding model.

Core Objects

ObjectScopePurposeSRE Question
StorageClassClusterDefines provisioner, parameters, reclaim policy, binding mode, expansion.Which backend and behavior will be used?
PersistentVolumeClaimNamespaceApp request for capacity, access mode, and class.Is the request satisfiable?
PersistentVolumeClusterConcrete volume object bound to a PVC.What backend asset is bound and what is its reclaim policy?
VolumeAttachmentClusterTracks CSI attach state for a node.Is the disk stuck attached to another node?
CSI controller/node PodsCluster add-onProvision, attach, mount, resize, snapshot, delete.Is the storage driver healthy?

Dynamic Vs Static Provisioning

Dynamic provisioning is the common cloud-native path: a PVC names a StorageClass and the CSI driver creates the backend volume. Static provisioning is when an admin creates a PV for an existing disk, NFS export, or special backend and lets a PVC bind to it.

PatternHow It WorksUse When
DynamicPVC + StorageClass triggers CSI provisioning.Most app volumes, cloud block storage, managed CSI.
StaticAdmin creates PV that points to existing storage.Existing NFS share, retained disk recovery, special enterprise storage.
Pre-boundPVC and PV use volumeName or claimRef to force binding.Controlled recovery or migration cases.

StorageClass

yamlstorageclass.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-retain
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
reclaimPolicy: Retain
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
FieldMeaningOperational Impact
provisionerCSI driver name.Must match installed CSI driver.
reclaimPolicyWhat happens to backend asset after PV release.Delete is convenient; Retain is safer for critical data.
allowVolumeExpansionWhether PVC size can grow.Shrinking is generally not supported.
volumeBindingModeWhen PV is provisioned/bound.WaitForFirstConsumer helps zone-aware scheduling.
parametersDriver-specific backend options.Class, performance, encryption, filesystem, type, IOPS.

PVC And Deployment From Scratch

yamlpvc-deployment.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: app
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: app-data
  namespace: app
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: fast-retain
  resources:
    requests:
      storage: 20Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-with-volume
  namespace: app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: app-with-volume
  template:
    metadata:
      labels:
        app: app-with-volume
    spec:
      containers:
        - name: app
          image: busybox:1.36
          command: ["sh", "-c", "date >> /data/heartbeat.txt; sleep 3600"]
          volumeMounts:
            - name: data
              mountPath: /data
      volumes:
        - name: data
          persistentVolumeClaim:
            claimName: app-data

Access Modes

ModeMeaningCommon BackendGotcha
ReadWriteOnce / RWOMounted read-write by one node.EBS, Azure Disk, GCE PD.Multiple Pods can use it only if scheduled on same node and backend permits.
ReadWriteMany / RWXMounted read-write by many nodes.NFS, EFS, CephFS, Azure Files.Not all storage supports it; performance semantics differ.
ReadOnlyMany / ROXMounted read-only by many nodes.Shared content volumes.App must not need writes.
ReadWriteOncePod / RWOPMounted read-write by only one Pod.CSI-supported block storage.Useful for stronger single-writer guarantees.

Binding Modes And Reclaim Policy

SettingWhat It MeansRisk/Benefit
ImmediateProvision/bind PVC as soon as it is created.Can choose a zone before scheduler knows where Pod should run.
WaitForFirstConsumerWait until a Pod uses the PVC, then provision in scheduled topology.Better for zonal disks and topology-aware scheduling.
DeleteBackend volume is deleted when PV is released.Good for ephemeral/non-critical data; dangerous if misunderstood.
RetainBackend volume remains after PVC deletion.Safer for important data; requires manual cleanup/rebind.

StatefulSet Storage Pattern

StatefulSets usually use volumeClaimTemplates. Each replica gets its own PVC, such as data-postgres-0, data-postgres-1. Deleting the StatefulSet does not automatically delete PVCs, which is usually what you want for data safety.

yamlstatefulset-storage.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
  namespace: data
spec:
  serviceName: postgres
  replicas: 3
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
        - name: postgres
          image: postgres:16
          volumeMounts:
            - name: data
              mountPath: /var/lib/postgresql/data
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: fast-retain
        resources:
          requests:
            storage: 100Gi

Static PersistentVolume

yamlstatic-nfs-pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-shared-pv
spec:
  capacity:
    storage: 100Gi
  accessModes: ["ReadWriteMany"]
  persistentVolumeReclaimPolicy: Retain
  storageClassName: nfs-static
  nfs:
    server: 10.0.10.25
    path: /exports/app

Daily Commands

bashstorage-ops.sh
kubectl get storageclass
kubectl describe storageclass <storageclass>
kubectl get pvc -n <namespace> -o wide
kubectl describe pvc <pvc> -n <namespace>
kubectl get pv
kubectl describe pv <pv>
kubectl get volumeattachment
kubectl get events -n <namespace> --sort-by=.lastTimestamp | grep -i -E 'volume|mount|attach|pvc|provision'

Expansion

Expanding is usually one-way. Confirm the StorageClass and CSI driver support expansion before patching. Filesystem resize may happen online or on next mount depending on driver and filesystem.

bashexpand-pvc.sh
kubectl get storageclass <storageclass> -o yaml | grep allowVolumeExpansion
kubectl patch pvc app-data -n app -p '{"spec":{"resources":{"requests":{"storage":"40Gi"}}}}'
kubectl describe pvc app-data -n app

Troubleshooting Workflow

  1. Start with the PVC: status, events, StorageClass, requested size, and access mode.
  2. Check the bound PV: reclaim policy, node affinity, capacity, backend handle.
  3. Check Pod events for FailedAttachVolume, FailedMount, or permissions errors.
  4. Inspect CSI controller and node plugin Pods in the storage namespace or kube-system.
  5. For zonal disks, check node zone, PV node affinity, and Pod scheduling constraints.

Symptom To Cause

SymptomLikely CauseCheck First
PVC PendingWrong StorageClass, no default class, CSI provisioner down, quota, WaitForFirstConsumer waiting for Pod.describe pvc, StorageClass, CSI controller logs.
Pod stuck ContainerCreatingAttach or mount failure.Pod events and VolumeAttachment.
Multi-Attach errorRWO disk still attached to another node.Old Pod/node, VolumeAttachment, cloud disk attachment.
Permission denied inside containerFilesystem ownership, securityContext, NFS export permissions.Mount path ownership, fsGroup, backend export.
Data disappeared after PVC deleteReclaim policy was Delete.PV reclaim policy and backup/snapshot availability.
Expansion stuckDriver does not support resize, filesystem resize pending, quota.PVC conditions and CSI logs.
Pod unschedulable with PVCZone/topology conflict.PV node affinity, node labels, volume binding mode.

Safe Change Pattern

  • 1Do not delete first: inspect PVC, PV, reclaim policy, snapshots, and owner before deleting anything.
  • 2Use Retain for important data: especially databases and client-owned persistent volumes.
  • 3Snapshot before risky changes: expansion, migration, reclaim-policy changes, or StatefulSet storage work.
  • 4Know the driver: CSI behavior differs by cloud, NAS, SAN, and on-prem platform.
  • 5Change through source of truth: Helm, GitOps, Terraform, or storage platform process.