CoreDNS — K8s SRE Reference

TL;DR

CoreDNS is the in-cluster DNS server. Pods send DNS queries to the kube-dns Service, which forwards to CoreDNS Pods. CoreDNS answers Kubernetes names such as web-api.app.svc.cluster.local from the Kubernetes API and forwards external names to upstream DNS. DNS incidents usually involve CoreDNS health, the kube-dns Service, search domain confusion, NetworkPolicy blocking UDP/TCP 53, upstream forwarding, or the Service/EndpointSlice not existing.

Mental Model

Kubernetes DNS is service discovery. Instead of hardcoding Pod IPs, apps call stable DNS names for Services. CoreDNS watches Kubernetes objects and answers records for Services, namespaces, and sometimes Pods. For anything outside the cluster, CoreDNS usually forwards the query to the node or corporate upstream DNS configured in the Corefile.

When debugging, separate two questions: can the Pod reach DNS at all, and is the name it asks for actually valid?

Pod DNS query path through the kube-dns Service to CoreDNS.

Service DNS Names

Inside a Pod, Kubernetes configures search domains so short names can work. That convenience is useful, but it also causes confusion when two namespaces have Services with the same name.

Name Used By App	Meaning	When To Use
`web-api`	Searches current namespace first.	Only when caller and Service are in the same namespace.
`web-api.app`	Service `web-api` in namespace `app`.	Good default for cross-namespace app calls.
`web-api.app.svc`	Service in namespace `app` under the cluster service zone.	Useful when search behavior is unclear.
`web-api.app.svc.cluster.local`	Fully qualified Service DNS name.	Best for debugging and unambiguous config.
`mysql-0.mysql.data.svc.cluster.local`	StatefulSet Pod DNS through a headless Service.	Stateful workloads needing stable Pod identity.

Pod resolv.conf

Most Pods use dnsPolicy: ClusterFirst. Their /etc/resolv.conf points to the cluster DNS Service IP and includes search domains like app.svc.cluster.local, svc.cluster.local, and cluster.local.

bashinspect-pod-dns.sh

kubectl run dns-shell -n app --rm -it --image=nicolaka/netshoot -- /bin/bash

cat /etc/resolv.conf
dig web-api
dig web-api.app
dig web-api.app.svc.cluster.local
dig kubernetes.default.svc.cluster.local

Pod DNS Setting	Meaning	SRE Note
`ClusterFirst`	Default for normal Pods. Use cluster DNS first.	Most app workloads should use this.
`Default`	Use node DNS settings, not cluster DNS first.	Can break Service discovery.
`ClusterFirstWithHostNet`	Use cluster DNS for hostNetwork Pods.	Often needed for agents running with host networking.
`None`	Use explicit `dnsConfig`.	Powerful but easy to misconfigure.

ClusterIP And Headless Records

A normal ClusterIP Service resolves to the Service virtual IP. A headless Service resolves directly to backend Pod IPs. That difference matters when debugging databases, StatefulSets, and clients that do their own load balancing.

bashrecord-checks.sh

# Normal Service: expect one ClusterIP answer.
kubectl get svc web-api -n app -o wide
dig +short web-api.app.svc.cluster.local

# Headless Service: expect backend Pod IPs or StatefulSet Pod records.
kubectl get svc mysql -n data -o yaml | grep -i clusterIP
dig +short mysql.data.svc.cluster.local
dig +short mysql-0.mysql.data.svc.cluster.local

From-Scratch DNS Lab

This lab creates a Service and checks short names, namespace-qualified names, and the full FQDN from a diagnostic Pod.

yamldns-lab.yaml

apiVersion: v1
kind: Namespace
metadata:
  name: app
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-api
  namespace: app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: web-api
  template:
    metadata:
      labels:
        app: web-api
    spec:
      containers:
        - name: web
          image: nginxdemos/hello:plain-text
          ports:
            - name: http
              containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: web-api
  namespace: app
spec:
  selector:
    app: web-api
  ports:
    - name: http
      port: 80
      targetPort: http

bashrun-dns-lab.sh

kubectl apply -f dns-lab.yaml
kubectl rollout status deploy/web-api -n app

kubectl run dns-shell -n app --rm -it --image=nicolaka/netshoot -- /bin/bash
cat /etc/resolv.conf
dig +short web-api
dig +short web-api.app
dig +short web-api.app.svc.cluster.local
curl -sS http://web-api.app.svc.cluster.local

Corefile And Plugins

The CoreDNS ConfigMap stores the Corefile. In managed clusters, edit it carefully and follow the provider's recommendations. A small syntax mistake can break DNS for the whole cluster.

textCorefile

.:53 {
    errors
    health
    ready
    kubernetes cluster.local in-addr.arpa ip6.arpa {
       pods insecure
       fallthrough in-addr.arpa ip6.arpa
    }
    prometheus :9153
    forward . /etc/resolv.conf
    cache 30
    loop
    reload
    loadbalance
}

Plugin	Purpose	Operational Note
`kubernetes`	Answers cluster Service and Pod DNS records.	Needs API connectivity.
`forward`	Sends non-cluster queries to upstream DNS.	Common source of external DNS failures.
`cache`	Caches DNS responses.	Improves latency; stale behavior depends on TTLs.
`loop`	Detects forwarding loops.	CrashLooping CoreDNS can indicate loop problems.
`reload`	Reloads Corefile changes.	Still validate by watching logs after edits.
`prometheus`	Exposes CoreDNS metrics.	Useful for latency, errors, and request volume.

Daily Checks

bashcoredns-checks.sh

kubectl get deploy,svc,cm -n kube-system | grep -E 'coredns|kube-dns'
kubectl get pods -n kube-system -l k8s-app=kube-dns -o wide
kubectl get svc kube-dns -n kube-system -o wide
kubectl get endpointslice -n kube-system -l kubernetes.io/service-name=kube-dns -o wide
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=100
kubectl describe configmap coredns -n kube-system

# Quick known-good lookup.
kubectl run dns-test --rm -it --image=busybox:1.36 -- \
  nslookup kubernetes.default.svc.cluster.local

Debugging Workflow

Check whether only one app is affected or every Pod in the cluster is affected.
From an affected namespace, inspect /etc/resolv.conf and run dig against the exact FQDN.
Confirm the Service exists and has EndpointSlices if it is a Kubernetes Service name.
Confirm the kube-dns Service has endpoints and CoreDNS Pods are Ready.
For external names, inspect the Corefile forwarder and test upstream DNS from CoreDNS nodes if possible.
Check NetworkPolicies that might block egress to UDP/TCP 53.

bashdns-incident.sh

NS=<namespace>
NAME=<service-or-domain>

kubectl run dns-debug -n "$NS" --rm -it --image=nicolaka/netshoot -- /bin/bash

cat /etc/resolv.conf
dig "$NAME"
dig "$NAME.$NS.svc.cluster.local"
dig kubernetes.default.svc.cluster.local
dig example.com

# In another terminal.
kubectl get pods -n kube-system -l k8s-app=kube-dns -o wide
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=200
kubectl get networkpolicy -A

Symptom To Cause

Symptom	Likely Cause	Check First
`NXDOMAIN` for Service	Wrong name, wrong namespace, Service does not exist.	`kubectl get svc -A \| grep name`, use full FQDN.
Service DNS resolves but traffic fails	DNS is fine; Service endpoints, targetPort, NetworkPolicy, or app is broken.	EndpointSlice and direct Service curl.
Short name fails, FQDN works	Search domain or namespace assumption wrong.	`/etc/resolv.conf`, caller namespace.
External DNS fails, cluster DNS works	Corefile forwarder, upstream DNS, node DNS, corporate DNS issue.	`forward` config and CoreDNS logs.
DNS intermittent or slow	CoreDNS overloaded, CPU throttling, too few replicas, upstream latency.	CoreDNS metrics, requests, throttling, logs.
Only one namespace fails	NetworkPolicy blocking egress to kube-dns or sidecar/proxy DNS behavior.	NetworkPolicies and pod DNS config.
CoreDNS CrashLoopBackOff	Corefile syntax, forwarding loop, plugin issue, bad config rollout.	CoreDNS logs, ConfigMap diff, events.
Headless Service records missing	Pods not Ready, wrong selector, StatefulSet/serviceName mismatch.	Service YAML, EndpointSlice, Pod readiness.

Safe Change Pattern

1Export before editing: save the current CoreDNS ConfigMap content in your change record or GitOps repo.
2Prefer source of truth: update Helm, Terraform, add-on config, or GitOps manifests rather than editing live config in production.
3Roll carefully: watch CoreDNS rollout, logs, and kubernetes.default.svc.cluster.local lookups immediately after changes.
4Have rollback ready: a broken Corefile can affect the whole cluster quickly.