TL;DR

CoreDNS is the in-cluster DNS server. Pods send DNS queries to the kube-dns Service, which forwards to CoreDNS Pods. CoreDNS answers Kubernetes names such as web-api.app.svc.cluster.local from the Kubernetes API and forwards external names to upstream DNS. DNS incidents usually involve CoreDNS health, the kube-dns Service, search domain confusion, NetworkPolicy blocking UDP/TCP 53, upstream forwarding, or the Service/EndpointSlice not existing.

Mental Model

Kubernetes DNS is service discovery. Instead of hardcoding Pod IPs, apps call stable DNS names for Services. CoreDNS watches Kubernetes objects and answers records for Services, namespaces, and sometimes Pods. For anything outside the cluster, CoreDNS usually forwards the query to the node or corporate upstream DNS configured in the Corefile.

When debugging, separate two questions: can the Pod reach DNS at all, and is the name it asks for actually valid?

App Pod/etc/resolv.confkube-dnsService :53CoreDNSkubernetes pluginKubernetes APIServices, slicesUpstream DNSexternal namesCluster names are answered from Kubernetes state; external names are forwarded upstream.

Pod DNS query path through the kube-dns Service to CoreDNS.

Service DNS Names

Inside a Pod, Kubernetes configures search domains so short names can work. That convenience is useful, but it also causes confusion when two namespaces have Services with the same name.

Name Used By AppMeaningWhen To Use
web-apiSearches current namespace first.Only when caller and Service are in the same namespace.
web-api.appService web-api in namespace app.Good default for cross-namespace app calls.
web-api.app.svcService in namespace app under the cluster service zone.Useful when search behavior is unclear.
web-api.app.svc.cluster.localFully qualified Service DNS name.Best for debugging and unambiguous config.
mysql-0.mysql.data.svc.cluster.localStatefulSet Pod DNS through a headless Service.Stateful workloads needing stable Pod identity.

Pod resolv.conf

Most Pods use dnsPolicy: ClusterFirst. Their /etc/resolv.conf points to the cluster DNS Service IP and includes search domains like app.svc.cluster.local, svc.cluster.local, and cluster.local.

bashinspect-pod-dns.sh
kubectl run dns-shell -n app --rm -it --image=nicolaka/netshoot -- /bin/bash

cat /etc/resolv.conf
dig web-api
dig web-api.app
dig web-api.app.svc.cluster.local
dig kubernetes.default.svc.cluster.local
Pod DNS SettingMeaningSRE Note
ClusterFirstDefault for normal Pods. Use cluster DNS first.Most app workloads should use this.
DefaultUse node DNS settings, not cluster DNS first.Can break Service discovery.
ClusterFirstWithHostNetUse cluster DNS for hostNetwork Pods.Often needed for agents running with host networking.
NoneUse explicit dnsConfig.Powerful but easy to misconfigure.

ClusterIP And Headless Records

A normal ClusterIP Service resolves to the Service virtual IP. A headless Service resolves directly to backend Pod IPs. That difference matters when debugging databases, StatefulSets, and clients that do their own load balancing.

bashrecord-checks.sh
# Normal Service: expect one ClusterIP answer.
kubectl get svc web-api -n app -o wide
dig +short web-api.app.svc.cluster.local

# Headless Service: expect backend Pod IPs or StatefulSet Pod records.
kubectl get svc mysql -n data -o yaml | grep -i clusterIP
dig +short mysql.data.svc.cluster.local
dig +short mysql-0.mysql.data.svc.cluster.local

From-Scratch DNS Lab

This lab creates a Service and checks short names, namespace-qualified names, and the full FQDN from a diagnostic Pod.

yamldns-lab.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: app
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-api
  namespace: app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: web-api
  template:
    metadata:
      labels:
        app: web-api
    spec:
      containers:
        - name: web
          image: nginxdemos/hello:plain-text
          ports:
            - name: http
              containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: web-api
  namespace: app
spec:
  selector:
    app: web-api
  ports:
    - name: http
      port: 80
      targetPort: http
bashrun-dns-lab.sh
kubectl apply -f dns-lab.yaml
kubectl rollout status deploy/web-api -n app

kubectl run dns-shell -n app --rm -it --image=nicolaka/netshoot -- /bin/bash
cat /etc/resolv.conf
dig +short web-api
dig +short web-api.app
dig +short web-api.app.svc.cluster.local
curl -sS http://web-api.app.svc.cluster.local

Corefile And Plugins

The CoreDNS ConfigMap stores the Corefile. In managed clusters, edit it carefully and follow the provider's recommendations. A small syntax mistake can break DNS for the whole cluster.

textCorefile
.:53 {
    errors
    health
    ready
    kubernetes cluster.local in-addr.arpa ip6.arpa {
       pods insecure
       fallthrough in-addr.arpa ip6.arpa
    }
    prometheus :9153
    forward . /etc/resolv.conf
    cache 30
    loop
    reload
    loadbalance
}
PluginPurposeOperational Note
kubernetesAnswers cluster Service and Pod DNS records.Needs API connectivity.
forwardSends non-cluster queries to upstream DNS.Common source of external DNS failures.
cacheCaches DNS responses.Improves latency; stale behavior depends on TTLs.
loopDetects forwarding loops.CrashLooping CoreDNS can indicate loop problems.
reloadReloads Corefile changes.Still validate by watching logs after edits.
prometheusExposes CoreDNS metrics.Useful for latency, errors, and request volume.

Daily Checks

bashcoredns-checks.sh
kubectl get deploy,svc,cm -n kube-system | grep -E 'coredns|kube-dns'
kubectl get pods -n kube-system -l k8s-app=kube-dns -o wide
kubectl get svc kube-dns -n kube-system -o wide
kubectl get endpointslice -n kube-system -l kubernetes.io/service-name=kube-dns -o wide
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=100
kubectl describe configmap coredns -n kube-system

# Quick known-good lookup.
kubectl run dns-test --rm -it --image=busybox:1.36 -- \
  nslookup kubernetes.default.svc.cluster.local

Debugging Workflow

  1. Check whether only one app is affected or every Pod in the cluster is affected.
  2. From an affected namespace, inspect /etc/resolv.conf and run dig against the exact FQDN.
  3. Confirm the Service exists and has EndpointSlices if it is a Kubernetes Service name.
  4. Confirm the kube-dns Service has endpoints and CoreDNS Pods are Ready.
  5. For external names, inspect the Corefile forwarder and test upstream DNS from CoreDNS nodes if possible.
  6. Check NetworkPolicies that might block egress to UDP/TCP 53.
bashdns-incident.sh
NS=<namespace>
NAME=<service-or-domain>

kubectl run dns-debug -n "$NS" --rm -it --image=nicolaka/netshoot -- /bin/bash

cat /etc/resolv.conf
dig "$NAME"
dig "$NAME.$NS.svc.cluster.local"
dig kubernetes.default.svc.cluster.local
dig example.com

# In another terminal.
kubectl get pods -n kube-system -l k8s-app=kube-dns -o wide
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=200
kubectl get networkpolicy -A

Symptom To Cause

SymptomLikely CauseCheck First
NXDOMAIN for ServiceWrong name, wrong namespace, Service does not exist.kubectl get svc -A | grep name, use full FQDN.
Service DNS resolves but traffic failsDNS is fine; Service endpoints, targetPort, NetworkPolicy, or app is broken.EndpointSlice and direct Service curl.
Short name fails, FQDN worksSearch domain or namespace assumption wrong./etc/resolv.conf, caller namespace.
External DNS fails, cluster DNS worksCorefile forwarder, upstream DNS, node DNS, corporate DNS issue.forward config and CoreDNS logs.
DNS intermittent or slowCoreDNS overloaded, CPU throttling, too few replicas, upstream latency.CoreDNS metrics, requests, throttling, logs.
Only one namespace failsNetworkPolicy blocking egress to kube-dns or sidecar/proxy DNS behavior.NetworkPolicies and pod DNS config.
CoreDNS CrashLoopBackOffCorefile syntax, forwarding loop, plugin issue, bad config rollout.CoreDNS logs, ConfigMap diff, events.
Headless Service records missingPods not Ready, wrong selector, StatefulSet/serviceName mismatch.Service YAML, EndpointSlice, Pod readiness.

Safe Change Pattern

  • 1Export before editing: save the current CoreDNS ConfigMap content in your change record or GitOps repo.
  • 2Prefer source of truth: update Helm, Terraform, add-on config, or GitOps manifests rather than editing live config in production.
  • 3Roll carefully: watch CoreDNS rollout, logs, and kubernetes.default.svc.cluster.local lookups immediately after changes.
  • 4Have rollback ready: a broken Corefile can affect the whole cluster quickly.