DaemonSets & Jobs
DaemonSets run one Pod per matching node, usually for node agents such as log collectors, CNI, storage plugins, and monitoring agents. Jobs run work to completion. CronJobs create Jobs on a schedule.
When To Use Each
| Workload | Purpose | Examples | Key Debug Signal |
|---|---|---|---|
| DaemonSet | Run a Pod on every matching node. | Fluent Bit, node-exporter, CNI, CSI node plugin. | Desired/current/ready count per node. |
| Job | Run finite work until successful completion. | Migration, batch import, one-time maintenance. | Completions, failed Pods, backoffLimit. |
| CronJob | Create Jobs on a schedule. | Backups, reports, periodic cleanup. | lastScheduleTime, missed schedules, Job history. |
DaemonSets
DaemonSets are ideal for node-local agents. When a new node joins, the DaemonSet controller creates a Pod there if the node matches selectors, affinity, and tolerations.
DaemonSet creates one Pod on each matching node.
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-log-agent
namespace: observability
spec:
selector:
matchLabels:
app: node-log-agent
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1 # Update one node agent at a time to preserve coverage.
template:
metadata:
labels:
app: node-log-agent
spec:
serviceAccountName: node-log-agent
tolerations:
- operator: Exists # Allows agent Pods on tainted nodes, including control-plane nodes if policy allows.
nodeSelector:
kubernetes.io/os: linux # Avoid scheduling Linux agent on Windows nodes.
containers:
- name: agent
image: fluent/fluent-bit:3.1
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
volumeMounts:
- name: varlog
mountPath: /var/log
readOnly: true # Log agents usually read host logs.
volumes:
- name: varlog
hostPath:
path: /var/log # Host path on every node.
type: Directory
DaemonSet Operations
# Inspect desired/current/ready counts.
kubectl get daemonset -n <namespace> -o wide
kubectl describe daemonset <daemonset> -n <namespace>
# Show where DaemonSet Pods are running.
kubectl get pods -n <namespace> -l app=<app-label> -o wide
# Watch a DaemonSet rollout.
kubectl rollout status daemonset/<daemonset> -n <namespace>
# Restart all DaemonSet Pods through a template annotation update.
kubectl rollout restart daemonset/<daemonset> -n <namespace>
Jobs
A Job creates Pods and tracks successful completions. Use Jobs for tasks that should finish, not long-running services. A failed Job may create replacement Pods until backoffLimit is reached.
apiVersion: batch/v1
kind: Job
metadata:
name: db-migration-202605
namespace: app
spec:
completions: 1 # Number of successful Pods required.
parallelism: 1 # Number of Pods allowed to run at the same time.
backoffLimit: 2 # Retry failed Pods twice before marking Job failed.
activeDeadlineSeconds: 1800 # Hard timeout for the whole Job.
ttlSecondsAfterFinished: 86400 # Clean up completed Job object after 1 day if TTL controller is enabled.
template:
metadata:
labels:
job-type: migration
spec:
restartPolicy: Never # Required for Jobs unless using OnFailure.
containers:
- name: migrate
image: registry.example.com/platform/app-migrations:2026.05.0
command: ["./migrate"]
args: ["--safe"] # App-specific migration flag.
envFrom:
- secretRef:
name: app-database-credentials
Job Operations
kubectl get jobs -n <namespace>
kubectl describe job <job-name> -n <namespace>
# Find Pods created by a Job.
kubectl get pods -n <namespace> -l job-name=<job-name> -o wide
# Read logs from the Job's Pods.
kubectl logs -n <namespace> job/<job-name> --all-containers=true
# Delete and recreate a Job when rerun is safe.
kubectl delete job <job-name> -n <namespace>
kubectl apply -f job.yaml
CronJobs
A CronJob creates Jobs based on a cron schedule. Pay close attention to concurrency policy, missed schedules, timezone expectations, and history cleanup.
apiVersion: batch/v1
kind: CronJob
metadata:
name: nightly-backup
namespace: app
spec:
schedule: "0 2 * * *" # Runs daily at 02:00 according to controller timezone/config.
concurrencyPolicy: Forbid # Do not start a new backup if the previous one is still running.
startingDeadlineSeconds: 900 # Allow 15 minutes for missed schedule catch-up.
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 3
jobTemplate:
spec:
backoffLimit: 1
activeDeadlineSeconds: 3600
template:
spec:
restartPolicy: Never
containers:
- name: backup
image: registry.example.com/platform/backup:1.0.0
command: ["/bin/sh", "-c"]
args:
- ./backup.sh # Script should be idempotent and safe to retry.
envFrom:
- secretRef:
name: backup-credentials
kubectl get cronjobs -n <namespace>
kubectl describe cronjob <cronjob-name> -n <namespace>
# Create a one-off Job from a CronJob template for manual testing.
kubectl create job <manual-job-name> -n <namespace> --from=cronjob/<cronjob-name>
# Suspend and resume a CronJob.
kubectl patch cronjob <cronjob-name> -n <namespace> -p '{"spec":{"suspend":true}}'
kubectl patch cronjob <cronjob-name> -n <namespace> -p '{"spec":{"suspend":false}}'
Troubleshooting
- DaemonSet missing nodes: check node selectors, taints/tolerations, affinity, OS labels, and cordoned nodes.
- DaemonSet unavailable: inspect Pod events for image pull, hostPath, privileged security policy, CNI, or resource pressure failures.
- Job keeps retrying: read failed Pod logs, check exit code, validate command args, and review
backoffLimit. - CronJob did not run: check schedule, suspend flag, controller health, missed starting deadline, and concurrency policy.
- Duplicate work risk: Jobs and CronJobs should be idempotent because retries and manual reruns happen during incidents.