Linux Performance
Use this page when a node or container is slow, OOM-killed, or hitting resource limits. Work top-down: load average → CPU saturation → memory pressure → disk I/O. Read-only tools first (top, free, vmstat), then deeper investigation.
Load Average
Load average is the number of runnable + uninterruptible processes over the last 1, 5, and 15 minutes. A value above the CPU core count indicates saturation.
uptime # 1/5/15 minute load averages
# Example: load average: 3.50, 2.10, 1.80 on a 4-core node = ~87% util, trending down
nproc # number of logical CPUs (include HT)
lscpu | grep -E "^CPU\(s\):|Thread|Core|Socket"
# Load broken down by CPU, runnable queue, I/O wait
vmstat 1 5 # 5 samples every 1 second
# columns: r=run queue, b=blocked on I/O, id=idle%, wa=iowait%, sy=kernel timeCPU Usage
Use top to identify which processes are consuming CPU; press 1 to expand per-core view, P to sort by CPU, M by memory.
# Interactive top
top -c # show full command line
top -H -p <pid> # show threads for a specific process
# Non-interactive snapshot (useful in scripts)
top -b -n 1 | head -30
# Per-CPU statistics
mpstat -P ALL 1 3 # all CPUs, 3 samples (requires sysstat)
# CPU time breakdown for a process
pidstat -u -p <pid> 1 5 # %user, %system, %CPU for PID
# Find top CPU consumers without top
ps -eo pid,ppid,cmd,%cpu --sort=-%cpu | head -15
# cgroups CPU usage (Kubernetes containers)
# Each pod is under /sys/fs/cgroup/cpu/kubepods/...
cat /sys/fs/cgroup/cpu/kubepods/pod<uid>/<container-id>/cpuacct.usage
# Or use kubectl top:
kubectl top pod <pod> -n <ns> --containersMemory
Linux memory can look "full" but still have available capacity due to page cache; focus on available memory (not free) and watch for swap usage and OOM events.
# Overview — focus on the "available" column, not "free"
free -h
# vmstat memory columns: swpd=swap used, buff=buffers, cache=page cache
vmstat -s
# Swap usage — non-zero swpd with si/so columns active means memory pressure
vmstat 1 5 # si=swap-in pages/s, so=swap-out pages/s
# Per-process memory
ps -eo pid,rss,vsz,comm --sort=-rss | head -15 # rss = resident set size (KB)
# RSS is physical memory; VSZ includes mapped but not allocated pages
# cgroup memory limits (Kubernetes pods)
cat /sys/fs/cgroup/memory/kubepods/pod<uid>/<container-id>/memory.usage_in_bytes
cat /sys/fs/cgroup/memory/kubepods/pod<uid>/<container-id>/memory.limit_in_bytes
# Kubernetes view
kubectl top pods -n <ns>
kubectl describe pod <pod> -n <ns> | grep -A5 "Limits\|Requests"OOM Investigation
When a container or process is killed by the OOM killer, the kernel logs the event in dmesg and journalctl, showing which process was killed and its RSS at the time.
# Node-level OOM events
dmesg -T | grep -i "oom\|killed"
journalctl -k | grep -i "oom\|killed" | tail -30
# Kubernetes OOM: look for OOMKilled reason
kubectl get pod <pod> -n <ns> -o jsonpath='{.status.containerStatuses[*].lastState}'
kubectl describe pod <pod> -n <ns> | grep -A5 "Last State\|OOMKilled\|Exit Code"
# Check current OOM score for a process (higher = more likely to be killed)
cat /proc/<pid>/oom_score
cat /proc/<pid>/oom_score_adj
# Kubernetes sets oom_score_adj based on QoS class:
# BestEffort → +999 (killed first), Burstable → proportional, Guaranteed → -997Disk I/O
High I/O wait (wa in vmstat) means the CPU is idle while waiting for disk; use iostat to identify which device is the bottleneck.
# Per-device I/O stats (requires sysstat)
iostat -xz 1 5 # -x extended, -z hide idle; columns: await=avg wait ms, %util=saturation
# Which processes are doing the I/O
iotop -ao # accumulated I/O; -o only show active processes (may need root)
# Block device queue depth and scheduler
cat /sys/block/sda/queue/scheduler
cat /sys/block/sda/stat # raw block device stats
# Kubernetes: ephemeral storage usage
kubectl describe node <node> | grep -A5 "Ephemeral Storage"
kubectl get pod <pod> -n <ns> -o jsonpath='{.status.ephemeralContainerStatuses}' 2>/dev/null
# Find what is writing to a file or directory
lsof +D /var/lib/docker/ # processes with open files under a path
inotifywait -m /path/to/dir # watch filesystem events in real timeFile Descriptors
A process hitting its FD limit will fail to open new files or connections; this manifests as "too many open files" errors even when memory/CPU look fine.
# System-wide FD usage
cat /proc/sys/fs/file-nr # open FDs / free FDs / system max
# Per-process FD count
ls -l /proc/<pid>/fd | wc -l # current open FDs for a process
cat /proc/<pid>/limits | grep "open files" # soft and hard limit
# List open FDs for a process
lsof -p <pid> | wc -l
lsof -p <pid> | grep "can't" 2>/dev/null # permission-denied FDs
# Raise limits without restart (for current session)
ulimit -n 65536
# Persistent: edit /etc/security/limits.conf or systemd unit LimitNOFILE=
# Kubernetes: set in pod spec
# resources:
# limits:
# (use sysctl pod security, or set in container's ulimit override)Performance Troubleshooting Map
| Symptom | First check | Likely cause |
|---|---|---|
| High load average | vmstat 1 — check r and b columns | CPU saturation (r>nproc) or I/O wait (b>0, wa%>10) |
| Container OOMKilled | kubectl describe pod + dmesg | grep oom | Memory limit too low, memory leak, no limit set |
| Slow disk writes | iostat -xz 1 — %util and await columns | EBS volume throttled, filesystem full, noisy neighbour |
| "Too many open files" | cat /proc/<pid>/limits | ulimit too low, FD leak in app |
| Swap in use | vmstat 1 — si/so columns | RAM exhausted; increase limits or add memory |
| CPU throttled (K8s) | kubectl top pod --containers | CPU limit too low; check throttled_time in cgroup stats |