StatefulSets
Use StatefulSets when each replica needs stable identity, ordered rollout, stable network DNS, and usually its own persistent volume. Pods are named predictably like mysql-0, mysql-1, and each Pod keeps its own PVC across restarts and rescheduling.
Mental Model
A StatefulSet is not just a Deployment with storage. It gives every replica a stable ordinal, hostname, and volume claim. That matters for databases, queues, consensus systems, and apps where each member has identity or data ownership.
StatefulSet identity: predictable Pod DNS names and one PVC per replica.
Deployment vs StatefulSet
| Need | Deployment | StatefulSet |
|---|---|---|
| Replica identity | Anonymous Pods; names change freely. | Stable names: app-0, app-1. |
| Storage | Shared or external storage pattern; Pods are replaceable. | Per-replica PVCs from volumeClaimTemplates. |
| Rollout order | Flexible parallel rollout. | Ordered by ordinal by default. |
| DNS | Service DNS points to interchangeable endpoints. | Headless Service gives per-Pod DNS records. |
| Best fit | Stateless APIs, web apps, workers. | Databases, brokers, quorum systems, identity-aware apps. |
Baseline StatefulSet YAML
This example shows the required relationship between a headless Service, StatefulSet serviceName, labels, and volumeClaimTemplates.
apiVersion: v1
kind: Service
metadata:
name: mysql # Must match StatefulSet spec.serviceName below.
namespace: data
labels:
app: mysql
spec:
clusterIP: None # Headless Service: creates DNS records for individual Pods.
selector:
app: mysql # Must match Pod template labels.
ports:
- name: mysql
port: 3306
targetPort: mysql
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql
namespace: data
spec:
serviceName: mysql # Required for stable network identity.
replicas: 3
podManagementPolicy: OrderedReady # Default. Creates/updates Pods in ordinal order.
updateStrategy:
type: RollingUpdate
rollingUpdate:
partition: 0 # 0 means update all ordinals. Higher values hold lower ordinals back.
selector:
matchLabels:
app: mysql
template:
metadata:
labels:
app: mysql
spec:
terminationGracePeriodSeconds: 60 # Databases often need longer graceful shutdown.
containers:
- name: mysql
image: mysql:8.4
ports:
- name: mysql
containerPort: 3306
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-root-password # Secret must exist in the same namespace.
key: password
volumeMounts:
- name: data
mountPath: /var/lib/mysql # Mount point inside the container.
readinessProbe:
tcpSocket:
port: mysql
periodSeconds: 10
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: "2"
memory: 2Gi
volumeClaimTemplates:
- metadata:
name: data # Creates PVC names like data-mysql-0, data-mysql-1.
spec:
accessModes: ["ReadWriteOnce"] # Common for one volume attached to one node.
storageClassName: gp3 # Replace with the client's StorageClass.
resources:
requests:
storage: 20Gi
Identity And DNS
Each Pod gets a stable hostname. With a headless Service, clients can address a specific replica using DNS such as mysql-0.mysql.data.svc.cluster.local.
# Replace values with your StatefulSet and namespace.
kubectl get statefulset mysql -n data
kubectl get pods -n data -l app=mysql -o wide
kubectl get pvc -n data -l app=mysql
# Check the headless Service.
kubectl get service mysql -n data -o wide
kubectl get endpointslice -n data -l kubernetes.io/service-name=mysql -o wide
# Test per-Pod DNS from a temporary debug Pod.
kubectl run dns-test -n data --rm -it --image=busybox:1.36 -- nslookup mysql-0.mysql.data.svc.cluster.local
Common Operations
# Inspect StatefulSet status, rollout, and events.
kubectl describe statefulset mysql -n data
kubectl rollout status statefulset/mysql -n data
kubectl rollout history statefulset/mysql -n data
# Scale up or down. Scaling down removes highest ordinal Pods first.
kubectl scale statefulset mysql -n data --replicas=5
kubectl scale statefulset mysql -n data --replicas=3
# Restart Pods in StatefulSet order using a template annotation change.
kubectl rollout restart statefulset/mysql -n data
# Delete one Pod. StatefulSet recreates it with the same name and PVC.
kubectl delete pod mysql-1 -n data
# Pause-like behavior for StatefulSets usually uses partitioned rollout.
kubectl patch statefulset mysql -n data --type merge -p '{"spec":{"updateStrategy":{"rollingUpdate":{"partition":2}}}}'
Rollouts And Partitions
StatefulSet rolling updates proceed from the highest ordinal down to the lowest. Partitioned rollouts let you update only replicas with ordinal greater than or equal to the partition. This is useful for canarying one replica before updating the rest.
spec:
replicas: 3
updateStrategy:
type: RollingUpdate
rollingUpdate:
partition: 2 # Only mysql-2 updates. mysql-0 and mysql-1 stay on old template.
Storage And PVC Behavior
StatefulSet PVCs are intentionally conservative. Deleting a StatefulSet or scaling it down does not delete PVCs by default, because Kubernetes avoids deleting data automatically.
| Action | Pod Result | PVC Result | SRE Note |
|---|---|---|---|
| Delete Pod | Recreated with same name | Same PVC reattached | Common safe recovery action if app tolerates restart. |
| Scale down | Highest ordinal Pods removed first | PVCs remain | Data remains for future scale-up unless manually deleted. |
| Delete StatefulSet | Pods deleted unless orphaned | PVCs remain | Confirm PVC cleanup separately. |
| Delete PVC | Data may be lost | PV reclaim policy decides backend behavior | Do only with explicit backup/restore plan. |
# List PVCs and their bound PVs.
kubectl get pvc -n data -o wide
kubectl describe pvc data-mysql-0 -n data
# Check reclaim policy before deleting any PVC.
kubectl get pv
kubectl describe pv <pv-name> | grep -E 'Reclaim Policy|StorageClass|Status|Claim'
# If StorageClass supports expansion, edit PVC storage request upward.
# Never shrink PVCs; Kubernetes volume shrinking is not generally supported.
kubectl patch pvc data-mysql-0 -n data -p '{"spec":{"resources":{"requests":{"storage":"40Gi"}}}}'
Pod Management Policy
| Policy | Behavior | Use When |
|---|---|---|
OrderedReady | Creates, updates, and deletes Pods in ordinal order. | Default for databases and quorum systems that need ordered membership. |
Parallel | Creates/deletes Pods in parallel, but keeps identity. | App can tolerate parallel start/stop and you want faster operations. |
Troubleshooting
- Pod stuck Pending: check PVC binding, StorageClass, node volume attach limits, zone topology, and scheduling constraints.
- Rollout stuck: StatefulSet may wait for a lower/higher ordinal to become Ready before continuing.
- DNS missing: verify headless Service exists,
clusterIP: None, selector matches Pod labels, and CoreDNS is healthy. - Volume attach failure: check whether the old node still holds the volume attachment and whether the storage backend supports the requested access mode.
- Do not casually delete PVCs: PVC deletion can delete backend storage depending on PV reclaim policy.
# Start with status and events.
kubectl get sts,pod,pvc,svc -n data -l app=mysql -o wide
kubectl describe statefulset mysql -n data
# Inspect the specific ordinal that is stuck.
kubectl describe pod mysql-1 -n data
kubectl logs mysql-1 -n data --tail=100
kubectl logs mysql-1 -n data --previous --tail=100
# Storage-related checks.
kubectl describe pvc data-mysql-1 -n data
kubectl get events -n data --sort-by=.lastTimestamp | grep -i -E 'mount|attach|volume|pvc|provision'
# Node and volume placement.
kubectl get pod mysql-1 -n data -o wide
kubectl describe node <node-name>