Linux Performance — K8s SRE Reference

TL;DR

Use this page when a node or container is slow, OOM-killed, or hitting resource limits. Work top-down: load average → CPU saturation → memory pressure → disk I/O. Read-only tools first (top, free, vmstat), then deeper investigation.

Load Average

Load average is the number of runnable + uninterruptible processes over the last 1, 5, and 15 minutes. A value above the CPU core count indicates saturation.

bashload.sh

uptime                          # 1/5/15 minute load averages
# Example: load average: 3.50, 2.10, 1.80 on a 4-core node = ~87% util, trending down

nproc                           # number of logical CPUs (include HT)
lscpu | grep -E "^CPU\(s\):|Thread|Core|Socket"

# Load broken down by CPU, runnable queue, I/O wait
vmstat 1 5                      # 5 samples every 1 second
# columns: r=run queue, b=blocked on I/O, id=idle%, wa=iowait%, sy=kernel time

CPU Usage

Use top to identify which processes are consuming CPU; press 1 to expand per-core view, P to sort by CPU, M by memory.

bashcpu.sh

# Interactive top
top -c              # show full command line
top -H -p <pid>    # show threads for a specific process

# Non-interactive snapshot (useful in scripts)
top -b -n 1 | head -30

# Per-CPU statistics
mpstat -P ALL 1 3   # all CPUs, 3 samples (requires sysstat)

# CPU time breakdown for a process
pidstat -u -p <pid> 1 5   # %user, %system, %CPU for PID

# Find top CPU consumers without top
ps -eo pid,ppid,cmd,%cpu --sort=-%cpu | head -15

# cgroups CPU usage (Kubernetes containers)
# Each pod is under /sys/fs/cgroup/cpu/kubepods/...
cat /sys/fs/cgroup/cpu/kubepods/pod<uid>/<container-id>/cpuacct.usage
# Or use kubectl top:
kubectl top pod <pod> -n <ns> --containers

Memory

Linux memory can look "full" but still have available capacity due to page cache; focus on available memory (not free) and watch for swap usage and OOM events.

bashmemory.sh

# Overview — focus on the "available" column, not "free"
free -h

# vmstat memory columns: swpd=swap used, buff=buffers, cache=page cache
vmstat -s

# Swap usage — non-zero swpd with si/so columns active means memory pressure
vmstat 1 5          # si=swap-in pages/s, so=swap-out pages/s

# Per-process memory
ps -eo pid,rss,vsz,comm --sort=-rss | head -15   # rss = resident set size (KB)
# RSS is physical memory; VSZ includes mapped but not allocated pages

# cgroup memory limits (Kubernetes pods)
cat /sys/fs/cgroup/memory/kubepods/pod<uid>/<container-id>/memory.usage_in_bytes
cat /sys/fs/cgroup/memory/kubepods/pod<uid>/<container-id>/memory.limit_in_bytes

# Kubernetes view
kubectl top pods -n <ns>
kubectl describe pod <pod> -n <ns> | grep -A5 "Limits\|Requests"

OOM Investigation

When a container or process is killed by the OOM killer, the kernel logs the event in dmesg and journalctl, showing which process was killed and its RSS at the time.

bashoom.sh

# Node-level OOM events
dmesg -T | grep -i "oom\|killed"
journalctl -k | grep -i "oom\|killed" | tail -30

# Kubernetes OOM: look for OOMKilled reason
kubectl get pod <pod> -n <ns> -o jsonpath='{.status.containerStatuses[*].lastState}'
kubectl describe pod <pod> -n <ns> | grep -A5 "Last State\|OOMKilled\|Exit Code"

# Check current OOM score for a process (higher = more likely to be killed)
cat /proc/<pid>/oom_score
cat /proc/<pid>/oom_score_adj

# Kubernetes sets oom_score_adj based on QoS class:
# BestEffort → +999 (killed first), Burstable → proportional, Guaranteed → -997

Disk I/O

High I/O wait (wa in vmstat) means the CPU is idle while waiting for disk; use iostat to identify which device is the bottleneck.

bashdisk-io.sh

# Per-device I/O stats (requires sysstat)
iostat -xz 1 5     # -x extended, -z hide idle; columns: await=avg wait ms, %util=saturation

# Which processes are doing the I/O
iotop -ao           # accumulated I/O; -o only show active processes (may need root)

# Block device queue depth and scheduler
cat /sys/block/sda/queue/scheduler
cat /sys/block/sda/stat         # raw block device stats

# Kubernetes: ephemeral storage usage
kubectl describe node <node> | grep -A5 "Ephemeral Storage"
kubectl get pod <pod> -n <ns> -o jsonpath='{.status.ephemeralContainerStatuses}' 2>/dev/null

# Find what is writing to a file or directory
lsof +D /var/lib/docker/         # processes with open files under a path
inotifywait -m /path/to/dir     # watch filesystem events in real time

File Descriptors

A process hitting its FD limit will fail to open new files or connections; this manifests as "too many open files" errors even when memory/CPU look fine.

bashfds.sh

# System-wide FD usage
cat /proc/sys/fs/file-nr        # open FDs / free FDs / system max

# Per-process FD count
ls -l /proc/<pid>/fd | wc -l   # current open FDs for a process
cat /proc/<pid>/limits | grep "open files"  # soft and hard limit

# List open FDs for a process
lsof -p <pid> | wc -l
lsof -p <pid> | grep "can't" 2>/dev/null  # permission-denied FDs

# Raise limits without restart (for current session)
ulimit -n 65536
# Persistent: edit /etc/security/limits.conf or systemd unit LimitNOFILE=

# Kubernetes: set in pod spec
# resources:
#   limits:
#     (use sysctl pod security, or set in container's ulimit override)

Performance Troubleshooting Map

Symptom	First check	Likely cause
High load average	`vmstat 1` — check r and b columns	CPU saturation (r>nproc) or I/O wait (b>0, wa%>10)
Container OOMKilled	`kubectl describe pod` + `dmesg \| grep oom`	Memory limit too low, memory leak, no limit set
Slow disk writes	`iostat -xz 1` — %util and await columns	EBS volume throttled, filesystem full, noisy neighbour
"Too many open files"	`cat /proc/<pid>/limits`	ulimit too low, FD leak in app
Swap in use	`vmstat 1` — si/so columns	RAM exhausted; increase limits or add memory
CPU throttled (K8s)	`kubectl top pod --containers`	CPU limit too low; check throttled_time in cgroup stats