CoreDNS
CoreDNS is the in-cluster DNS server. Pods send DNS queries to the kube-dns Service, which forwards to CoreDNS Pods. CoreDNS answers Kubernetes names such as web-api.app.svc.cluster.local from the Kubernetes API and forwards external names to upstream DNS. DNS incidents usually involve CoreDNS health, the kube-dns Service, search domain confusion, NetworkPolicy blocking UDP/TCP 53, upstream forwarding, or the Service/EndpointSlice not existing.
Mental Model
Kubernetes DNS is service discovery. Instead of hardcoding Pod IPs, apps call stable DNS names for Services. CoreDNS watches Kubernetes objects and answers records for Services, namespaces, and sometimes Pods. For anything outside the cluster, CoreDNS usually forwards the query to the node or corporate upstream DNS configured in the Corefile.
When debugging, separate two questions: can the Pod reach DNS at all, and is the name it asks for actually valid?
Pod DNS query path through the kube-dns Service to CoreDNS.
Service DNS Names
Inside a Pod, Kubernetes configures search domains so short names can work. That convenience is useful, but it also causes confusion when two namespaces have Services with the same name.
| Name Used By App | Meaning | When To Use |
|---|---|---|
web-api | Searches current namespace first. | Only when caller and Service are in the same namespace. |
web-api.app | Service web-api in namespace app. | Good default for cross-namespace app calls. |
web-api.app.svc | Service in namespace app under the cluster service zone. | Useful when search behavior is unclear. |
web-api.app.svc.cluster.local | Fully qualified Service DNS name. | Best for debugging and unambiguous config. |
mysql-0.mysql.data.svc.cluster.local | StatefulSet Pod DNS through a headless Service. | Stateful workloads needing stable Pod identity. |
Pod resolv.conf
Most Pods use dnsPolicy: ClusterFirst. Their /etc/resolv.conf points to the cluster DNS Service IP and includes search domains like app.svc.cluster.local, svc.cluster.local, and cluster.local.
kubectl run dns-shell -n app --rm -it --image=nicolaka/netshoot -- /bin/bash
cat /etc/resolv.conf
dig web-api
dig web-api.app
dig web-api.app.svc.cluster.local
dig kubernetes.default.svc.cluster.local| Pod DNS Setting | Meaning | SRE Note |
|---|---|---|
ClusterFirst | Default for normal Pods. Use cluster DNS first. | Most app workloads should use this. |
Default | Use node DNS settings, not cluster DNS first. | Can break Service discovery. |
ClusterFirstWithHostNet | Use cluster DNS for hostNetwork Pods. | Often needed for agents running with host networking. |
None | Use explicit dnsConfig. | Powerful but easy to misconfigure. |
ClusterIP And Headless Records
A normal ClusterIP Service resolves to the Service virtual IP. A headless Service resolves directly to backend Pod IPs. That difference matters when debugging databases, StatefulSets, and clients that do their own load balancing.
# Normal Service: expect one ClusterIP answer.
kubectl get svc web-api -n app -o wide
dig +short web-api.app.svc.cluster.local
# Headless Service: expect backend Pod IPs or StatefulSet Pod records.
kubectl get svc mysql -n data -o yaml | grep -i clusterIP
dig +short mysql.data.svc.cluster.local
dig +short mysql-0.mysql.data.svc.cluster.localFrom-Scratch DNS Lab
This lab creates a Service and checks short names, namespace-qualified names, and the full FQDN from a diagnostic Pod.
apiVersion: v1
kind: Namespace
metadata:
name: app
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-api
namespace: app
spec:
replicas: 2
selector:
matchLabels:
app: web-api
template:
metadata:
labels:
app: web-api
spec:
containers:
- name: web
image: nginxdemos/hello:plain-text
ports:
- name: http
containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: web-api
namespace: app
spec:
selector:
app: web-api
ports:
- name: http
port: 80
targetPort: httpkubectl apply -f dns-lab.yaml
kubectl rollout status deploy/web-api -n app
kubectl run dns-shell -n app --rm -it --image=nicolaka/netshoot -- /bin/bash
cat /etc/resolv.conf
dig +short web-api
dig +short web-api.app
dig +short web-api.app.svc.cluster.local
curl -sS http://web-api.app.svc.cluster.localCorefile And Plugins
The CoreDNS ConfigMap stores the Corefile. In managed clusters, edit it carefully and follow the provider's recommendations. A small syntax mistake can break DNS for the whole cluster.
.:53 {
errors
health
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}| Plugin | Purpose | Operational Note |
|---|---|---|
kubernetes | Answers cluster Service and Pod DNS records. | Needs API connectivity. |
forward | Sends non-cluster queries to upstream DNS. | Common source of external DNS failures. |
cache | Caches DNS responses. | Improves latency; stale behavior depends on TTLs. |
loop | Detects forwarding loops. | CrashLooping CoreDNS can indicate loop problems. |
reload | Reloads Corefile changes. | Still validate by watching logs after edits. |
prometheus | Exposes CoreDNS metrics. | Useful for latency, errors, and request volume. |
Daily Checks
kubectl get deploy,svc,cm -n kube-system | grep -E 'coredns|kube-dns'
kubectl get pods -n kube-system -l k8s-app=kube-dns -o wide
kubectl get svc kube-dns -n kube-system -o wide
kubectl get endpointslice -n kube-system -l kubernetes.io/service-name=kube-dns -o wide
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=100
kubectl describe configmap coredns -n kube-system
# Quick known-good lookup.
kubectl run dns-test --rm -it --image=busybox:1.36 -- \
nslookup kubernetes.default.svc.cluster.localDebugging Workflow
- Check whether only one app is affected or every Pod in the cluster is affected.
- From an affected namespace, inspect
/etc/resolv.confand rundigagainst the exact FQDN. - Confirm the Service exists and has EndpointSlices if it is a Kubernetes Service name.
- Confirm the
kube-dnsService has endpoints and CoreDNS Pods are Ready. - For external names, inspect the Corefile forwarder and test upstream DNS from CoreDNS nodes if possible.
- Check NetworkPolicies that might block egress to UDP/TCP 53.
NS=<namespace>
NAME=<service-or-domain>
kubectl run dns-debug -n "$NS" --rm -it --image=nicolaka/netshoot -- /bin/bash
cat /etc/resolv.conf
dig "$NAME"
dig "$NAME.$NS.svc.cluster.local"
dig kubernetes.default.svc.cluster.local
dig example.com
# In another terminal.
kubectl get pods -n kube-system -l k8s-app=kube-dns -o wide
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=200
kubectl get networkpolicy -ASymptom To Cause
| Symptom | Likely Cause | Check First |
|---|---|---|
NXDOMAIN for Service | Wrong name, wrong namespace, Service does not exist. | kubectl get svc -A | grep name, use full FQDN. |
| Service DNS resolves but traffic fails | DNS is fine; Service endpoints, targetPort, NetworkPolicy, or app is broken. | EndpointSlice and direct Service curl. |
| Short name fails, FQDN works | Search domain or namespace assumption wrong. | /etc/resolv.conf, caller namespace. |
| External DNS fails, cluster DNS works | Corefile forwarder, upstream DNS, node DNS, corporate DNS issue. | forward config and CoreDNS logs. |
| DNS intermittent or slow | CoreDNS overloaded, CPU throttling, too few replicas, upstream latency. | CoreDNS metrics, requests, throttling, logs. |
| Only one namespace fails | NetworkPolicy blocking egress to kube-dns or sidecar/proxy DNS behavior. | NetworkPolicies and pod DNS config. |
| CoreDNS CrashLoopBackOff | Corefile syntax, forwarding loop, plugin issue, bad config rollout. | CoreDNS logs, ConfigMap diff, events. |
| Headless Service records missing | Pods not Ready, wrong selector, StatefulSet/serviceName mismatch. | Service YAML, EndpointSlice, Pod readiness. |
Safe Change Pattern
- Export before editing: save the current CoreDNS ConfigMap content in your change record or GitOps repo.
- Prefer source of truth: update Helm, Terraform, add-on config, or GitOps manifests rather than editing live config in production.
- Roll carefully: watch CoreDNS rollout, logs, and
kubernetes.default.svc.cluster.locallookups immediately after changes. - Have rollback ready: a broken Corefile can affect the whole cluster quickly.