TL;DR

Use StatefulSets when each replica needs stable identity, ordered rollout, stable network DNS, and usually its own persistent volume. Pods are named predictably like mysql-0, mysql-1, and each Pod keeps its own PVC across restarts and rescheduling.

Mental Model

A StatefulSet is not just a Deployment with storage. It gives every replica a stable ordinal, hostname, and volume claim. That matters for databases, queues, consensus systems, and apps where each member has identity or data ownership.

STATEFULSET: mysql mysql-0 DNS: mysql-0.mysql ordinal 0 data-mysql-0 PVC mysql-1 DNS: mysql-1.mysql ordinal 1 data-mysql-1 PVC mysql-2 DNS: mysql-2.mysql ordinal 2 data-mysql-2 PVC Stable Pod names and PVC names survive rescheduling; deletion of the StatefulSet does not automatically delete PVC data.

StatefulSet identity: predictable Pod DNS names and one PVC per replica.

Deployment vs StatefulSet

NeedDeploymentStatefulSet
Replica identityAnonymous Pods; names change freely.Stable names: app-0, app-1.
StorageShared or external storage pattern; Pods are replaceable.Per-replica PVCs from volumeClaimTemplates.
Rollout orderFlexible parallel rollout.Ordered by ordinal by default.
DNSService DNS points to interchangeable endpoints.Headless Service gives per-Pod DNS records.
Best fitStateless APIs, web apps, workers.Databases, brokers, quorum systems, identity-aware apps.

Baseline StatefulSet YAML

This example shows the required relationship between a headless Service, StatefulSet serviceName, labels, and volumeClaimTemplates.

yamlstatefulset-baseline.yaml
apiVersion: v1
kind: Service
metadata:
  name: mysql # Must match StatefulSet spec.serviceName below.
  namespace: data
  labels:
    app: mysql
spec:
  clusterIP: None # Headless Service: creates DNS records for individual Pods.
  selector:
    app: mysql # Must match Pod template labels.
  ports:
    - name: mysql
      port: 3306
      targetPort: mysql
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
  namespace: data
spec:
  serviceName: mysql # Required for stable network identity.
  replicas: 3
  podManagementPolicy: OrderedReady # Default. Creates/updates Pods in ordinal order.
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      partition: 0 # 0 means update all ordinals. Higher values hold lower ordinals back.
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      terminationGracePeriodSeconds: 60 # Databases often need longer graceful shutdown.
      containers:
        - name: mysql
          image: mysql:8.4
          ports:
            - name: mysql
              containerPort: 3306
          env:
            - name: MYSQL_ROOT_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: mysql-root-password # Secret must exist in the same namespace.
                  key: password
          volumeMounts:
            - name: data
              mountPath: /var/lib/mysql # Mount point inside the container.
          readinessProbe:
            tcpSocket:
              port: mysql
            periodSeconds: 10
          resources:
            requests:
              cpu: 500m
              memory: 1Gi
            limits:
              cpu: "2"
              memory: 2Gi
  volumeClaimTemplates:
    - metadata:
        name: data # Creates PVC names like data-mysql-0, data-mysql-1.
      spec:
        accessModes: ["ReadWriteOnce"] # Common for one volume attached to one node.
        storageClassName: gp3 # Replace with the client's StorageClass.
        resources:
          requests:
            storage: 20Gi

Identity And DNS

Each Pod gets a stable hostname. With a headless Service, clients can address a specific replica using DNS such as mysql-0.mysql.data.svc.cluster.local.

bashstateful-dns-checks.sh
# Replace values with your StatefulSet and namespace.
kubectl get statefulset mysql -n data
kubectl get pods -n data -l app=mysql -o wide
kubectl get pvc -n data -l app=mysql

# Check the headless Service.
kubectl get service mysql -n data -o wide
kubectl get endpointslice -n data -l kubernetes.io/service-name=mysql -o wide

# Test per-Pod DNS from a temporary debug Pod.
kubectl run dns-test -n data --rm -it --image=busybox:1.36 -- nslookup mysql-0.mysql.data.svc.cluster.local

Common Operations

bashstatefulset-ops.sh
# Inspect StatefulSet status, rollout, and events.
kubectl describe statefulset mysql -n data
kubectl rollout status statefulset/mysql -n data
kubectl rollout history statefulset/mysql -n data

# Scale up or down. Scaling down removes highest ordinal Pods first.
kubectl scale statefulset mysql -n data --replicas=5
kubectl scale statefulset mysql -n data --replicas=3

# Restart Pods in StatefulSet order using a template annotation change.
kubectl rollout restart statefulset/mysql -n data

# Delete one Pod. StatefulSet recreates it with the same name and PVC.
kubectl delete pod mysql-1 -n data

# Pause-like behavior for StatefulSets usually uses partitioned rollout.
kubectl patch statefulset mysql -n data --type merge -p '{"spec":{"updateStrategy":{"rollingUpdate":{"partition":2}}}}'

Rollouts And Partitions

StatefulSet rolling updates proceed from the highest ordinal down to the lowest. Partitioned rollouts let you update only replicas with ordinal greater than or equal to the partition. This is useful for canarying one replica before updating the rest.

yamlpartitioned-rollout.yaml
spec:
  replicas: 3
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      partition: 2 # Only mysql-2 updates. mysql-0 and mysql-1 stay on old template.

Storage And PVC Behavior

StatefulSet PVCs are intentionally conservative. Deleting a StatefulSet or scaling it down does not delete PVCs by default, because Kubernetes avoids deleting data automatically.

ActionPod ResultPVC ResultSRE Note
Delete PodRecreated with same nameSame PVC reattachedCommon safe recovery action if app tolerates restart.
Scale downHighest ordinal Pods removed firstPVCs remainData remains for future scale-up unless manually deleted.
Delete StatefulSetPods deleted unless orphanedPVCs remainConfirm PVC cleanup separately.
Delete PVCData may be lostPV reclaim policy decides backend behaviorDo only with explicit backup/restore plan.
bashpvc-checks.sh
# List PVCs and their bound PVs.
kubectl get pvc -n data -o wide
kubectl describe pvc data-mysql-0 -n data

# Check reclaim policy before deleting any PVC.
kubectl get pv
kubectl describe pv <pv-name> | grep -E 'Reclaim Policy|StorageClass|Status|Claim'

# If StorageClass supports expansion, edit PVC storage request upward.
# Never shrink PVCs; Kubernetes volume shrinking is not generally supported.
kubectl patch pvc data-mysql-0 -n data -p '{"spec":{"resources":{"requests":{"storage":"40Gi"}}}}'

Pod Management Policy

PolicyBehaviorUse When
OrderedReadyCreates, updates, and deletes Pods in ordinal order.Default for databases and quorum systems that need ordered membership.
ParallelCreates/deletes Pods in parallel, but keeps identity.App can tolerate parallel start/stop and you want faster operations.

Troubleshooting

  • !Pod stuck Pending: check PVC binding, StorageClass, node volume attach limits, zone topology, and scheduling constraints.
  • !Rollout stuck: StatefulSet may wait for a lower/higher ordinal to become Ready before continuing.
  • !DNS missing: verify headless Service exists, clusterIP: None, selector matches Pod labels, and CoreDNS is healthy.
  • !Volume attach failure: check whether the old node still holds the volume attachment and whether the storage backend supports the requested access mode.
  • !Do not casually delete PVCs: PVC deletion can delete backend storage depending on PV reclaim policy.
bashstatefulset-debug.sh
# Start with status and events.
kubectl get sts,pod,pvc,svc -n data -l app=mysql -o wide
kubectl describe statefulset mysql -n data

# Inspect the specific ordinal that is stuck.
kubectl describe pod mysql-1 -n data
kubectl logs mysql-1 -n data --tail=100
kubectl logs mysql-1 -n data --previous --tail=100

# Storage-related checks.
kubectl describe pvc data-mysql-1 -n data
kubectl get events -n data --sort-by=.lastTimestamp | grep -i -E 'mount|attach|volume|pvc|provision'

# Node and volume placement.
kubectl get pod mysql-1 -n data -o wide
kubectl describe node <node-name>