Services & Load Balancing
A Kubernetes Service gives stable access to Pods whose IPs keep changing. The Service selector finds matching ready Pods, Kubernetes writes EndpointSlices, and the cluster dataplane routes traffic from the Service IP or DNS name to one backend Pod. When a Service breaks, check selector, EndpointSlice, Pod readiness, targetPort, DNS, network policy, kube-proxy/CNI, and the cloud load balancer in that order.
Mental Model
Pods are disposable. Every restart can create a new Pod name and IP address, so other workloads should not call Pod IPs directly. A Service is the stable contract in front of those Pods: one name, one virtual IP, one port mapping, and a selector that decides which Pods are eligible backends.
For an SRE, the important distinction is this: the Service object does not run your application. It only points traffic at Pods. If the Service has no endpoints, traffic has nowhere useful to go.
A Service is stable; Pods behind it are replaceable.
How It Works
- You create Pods, usually through a Deployment or StatefulSet.
- You add labels to those Pods, such as
app: web-api. - You create a Service with a selector that matches those labels.
- Kubernetes creates EndpointSlices containing the IPs and ports of matching ready Pods.
- CoreDNS creates DNS records such as
web-api.app.svc.cluster.local. - The node dataplane, commonly kube-proxy iptables/IPVS or an eBPF CNI, routes Service traffic to backend Pods.
Service Types
| Type | Reachable From | Use When | SRE Watchpoint |
|---|---|---|---|
ClusterIP | Inside the cluster | Service-to-service calls, internal APIs, databases exposed only to apps. | Default and safest. Debug endpoints before blaming DNS. |
NodePort | Each node IP on a static high port | Lab exposure, external appliances, or as a lower-level building block. | Opens every node. Usually avoid for direct production access. |
LoadBalancer | External or private cloud load balancer | Expose a service through the cloud provider or MetalLB. | Requires cloud controller, IAM, subnet tags, quotas, and health checks. |
ExternalName | Inside cluster DNS alias | Point an in-cluster name at an external DNS name. | No proxying or health checks; it is just DNS CNAME behavior. |
| Headless | Direct Pod records | StatefulSets, databases, custom client-side discovery. | No virtual IP. Clients see individual Pod addresses. |
Build From Scratch
This lab creates a namespace, a simple HTTP Deployment, and a ClusterIP Service. The point is to learn the relationship between labels, selectors, ports, and endpoints.
apiVersion: v1
kind: Namespace
metadata:
name: app
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-api
namespace: app
spec:
replicas: 3
selector:
matchLabels:
app: web-api
template:
metadata:
labels:
app: web-api
spec:
containers:
- name: web
image: nginxdemos/hello:plain-text
ports:
- name: http
containerPort: 80
readinessProbe:
httpGet:
path: /
port: http
periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
name: web-api
namespace: app
spec:
type: ClusterIP
selector:
app: web-api
ports:
- name: http
port: 80
targetPort: httpkubectl apply -f web-api-service-lab.yaml
kubectl rollout status deploy/web-api -n app
# See the Service and the Pods it should select.
kubectl get svc web-api -n app -o wide
kubectl get pods -n app -l app=web-api -o wide --show-labels
# EndpointSlice is the real backend list used by modern Kubernetes.
kubectl get endpointslice -n app -l kubernetes.io/service-name=web-api -o wide
# Test from inside the cluster.
kubectl run curl -n app --rm -it --image=curlimages/curl --restart=Never -- \
curl -sS http://web-api.app.svc.cluster.localPorts And Selectors
Most Service mistakes happen in two tiny fields: selector and targetPort. The selector must match Pod labels. The Service port is what clients call. The targetPort is where the container is listening.
| Field | Meaning | Example |
|---|---|---|
selector | Labels used to find backend Pods. | app: web-api |
port | Service port clients connect to. | 80 |
targetPort | Pod container port, by name or number. | http or 8080 |
containerPort | Documented port in the Pod spec. Useful for named targetPorts. | name: http, containerPort: 8080 |
containerPort does not open a firewall by itself. Your app must actually listen on that port, and the Service targetPort must point to it.Service DNS
Inside the cluster, CoreDNS gives Services predictable names. Pods in the same namespace can usually call http://web-api. Pods in another namespace should use web-api.app or the full name web-api.app.svc.cluster.local.
kubectl run netshoot -n app --rm -it --image=nicolaka/netshoot -- /bin/bash
nslookup web-api
nslookup web-api.app
nslookup web-api.app.svc.cluster.local
curl -v http://web-api
curl -v http://web-api.app.svc.cluster.localLoadBalancer Services
A LoadBalancer Service asks the infrastructure provider to create an external or internal load balancer. In AWS EKS, Azure AKS, GKE, OpenStack, or on-prem MetalLB, a controller watches the Service and provisions provider-specific resources.
apiVersion: v1
kind: Service
metadata:
name: web-api-public
namespace: app
annotations:
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
spec:
type: LoadBalancer
externalTrafficPolicy: Cluster
selector:
app: web-api
ports:
- name: http
port: 80
targetPort: http| Setting | What It Does | Tradeoff |
|---|---|---|
externalTrafficPolicy: Cluster | Any node can receive traffic and forward to any ready endpoint. | Better distribution, but usually hides original client source IP. |
externalTrafficPolicy: Local | Node only forwards to local endpoints. | Preserves source IP, but nodes without local ready Pods fail LB health checks. |
| Provider annotations | Control scheme, type, target mode, health checks, SSL, subnets, security groups. | Cloud-specific. Always check client platform standards. |
Headless Services
A headless Service sets clusterIP: None. Instead of returning one virtual IP, DNS returns individual Pod records. This is common for StatefulSets where clients need stable identities such as mysql-0.mysql.data.svc.cluster.local.
apiVersion: v1
kind: Service
metadata:
name: mysql
namespace: data
spec:
clusterIP: None
selector:
app: mysql
ports:
- name: mysql
port: 3306
targetPort: 3306Production Defaults
- Name ports: use names like
http,grpc, andmetricsso probes, Services, and NetworkPolicies stay readable. - Use readiness probes: only ready Pods should receive Service traffic.
- Keep internal services internal: prefer
ClusterIPunless external access is truly required. - Track ownership: check whether the Service is managed by Helm, ArgoCD, Flux, Terraform, or a cloud controller before patching.
- Document cloud annotations: provider-specific annotations are operational behavior, not decoration.
Debugging Checklist
Debug Services from inside out. First prove the Pods are healthy, then prove the Service points at them, then test DNS, dataplane, network policy, and external load balancer behavior.
NS=<namespace>
SVC=<service>
APP=<app-label-value>
# 1. Inspect the Service contract.
kubectl get svc "$SVC" -n "$NS" -o wide
kubectl describe svc "$SVC" -n "$NS"
kubectl get svc "$SVC" -n "$NS" -o yaml
# 2. Check selected Pods and readiness.
kubectl get pods -n "$NS" -l app="$APP" -o wide --show-labels
kubectl describe pods -n "$NS" -l app="$APP"
# 3. EndpointSlices should contain ready backend addresses.
kubectl get endpointslice -n "$NS" -l kubernetes.io/service-name="$SVC" -o wide
kubectl get endpointslice -n "$NS" -l kubernetes.io/service-name="$SVC" -o yaml
# 4. Test from inside the cluster.
kubectl run netshoot -n "$NS" --rm -it --image=nicolaka/netshoot -- /bin/bash
nslookup "$SVC.$NS.svc.cluster.local"
curl -vk "http://$SVC.$NS.svc.cluster.local:80"kube-proxy And CNI Dataplanes
Traditional clusters use kube-proxy to program iptables or IPVS rules. Some CNIs, especially eBPF-based dataplanes such as Cilium, can replace kube-proxy. You do not need to memorize every implementation to troubleshoot well; prove the Kubernetes objects first, then inspect the active dataplane.
kubectl get pods -n kube-system -o wide | grep -E 'kube-proxy|cilium|calico'
# kube-proxy clusters.
kubectl get daemonset -n kube-system kube-proxy
kubectl logs -n kube-system -l k8s-app=kube-proxy --tail=100
# CNI-specific checks vary by environment.
kubectl get pods -n kube-system -l k8s-app=cilium -o wide
kubectl get pods -n kube-system -l k8s-app=calico-node -o wideSymptom To Cause
| Symptom | Likely Cause | Check First |
|---|---|---|
| Service has no endpoints | Selector mismatch, Pods not Ready, wrong namespace. | kubectl describe svc, kubectl get pods --show-labels, EndpointSlice. |
| DNS name does not resolve | Wrong namespace/name, CoreDNS issue, pod DNS config. | nslookup from netshoot, CoreDNS pods/logs. |
| DNS resolves but curl times out | NetworkPolicy, kube-proxy/CNI issue, app not listening, wrong targetPort. | EndpointSlice, direct Pod IP curl, NetworkPolicy list. |
| Pod IP works but Service IP fails | Service port mapping or node dataplane problem. | Service YAML, kube-proxy/CNI health. |
| LoadBalancer stuck Pending | No cloud LB support, missing controller, IAM, subnet tags, quota. | Service events, cloud controller logs, provider docs. |
| External traffic reaches only some Pods | externalTrafficPolicy: Local with uneven Pods across nodes. | Pod placement, LB target health, node-local endpoints. |
| Client source IP missing | externalTrafficPolicy: Cluster or proxy/LB behavior. | Service spec and load balancer settings. |
Safe Change Pattern
- Confirm ownership with labels and annotations such as
app.kubernetes.io/managed-by, Helm release metadata, or ArgoCD labels. - Fix the source of truth: Helm values, GitOps manifests, Terraform, or platform module.
- Use
kubectl diff, Helm template output, ArgoCD diff, or Terraform plan before applying. - After rollout, verify Service endpoints, DNS, in-cluster curl, and any external health checks.