TL;DR

A Kubernetes Service gives stable access to Pods whose IPs keep changing. The Service selector finds matching ready Pods, Kubernetes writes EndpointSlices, and the cluster dataplane routes traffic from the Service IP or DNS name to one backend Pod. When a Service breaks, check selector, EndpointSlice, Pod readiness, targetPort, DNS, network policy, kube-proxy/CNI, and the cloud load balancer in that order.

Mental Model

Pods are disposable. Every restart can create a new Pod name and IP address, so other workloads should not call Pod IPs directly. A Service is the stable contract in front of those Pods: one name, one virtual IP, one port mapping, and a selector that decides which Pods are eligible backends.

For an SRE, the important distinction is this: the Service object does not run your application. It only points traffic at Pods. If the Service has no endpoints, traffic has nowhere useful to go.

Client Podcurl web-apiService10.96.20.15:80EndpointSliceready Pod IPsPod A :8080Pod B :8080Pod C :8080Service selector -> ready Pods -> EndpointSlice -> dataplane routes traffic to one backend.

A Service is stable; Pods behind it are replaceable.

How It Works

  1. You create Pods, usually through a Deployment or StatefulSet.
  2. You add labels to those Pods, such as app: web-api.
  3. You create a Service with a selector that matches those labels.
  4. Kubernetes creates EndpointSlices containing the IPs and ports of matching ready Pods.
  5. CoreDNS creates DNS records such as web-api.app.svc.cluster.local.
  6. The node dataplane, commonly kube-proxy iptables/IPVS or an eBPF CNI, routes Service traffic to backend Pods.
i
Interview phrasingA Service is an abstraction for stable network identity. It load-balances to ready Pod endpoints selected by labels. EndpointSlice is the modern scalable replacement for the older Endpoints object.

Service Types

TypeReachable FromUse WhenSRE Watchpoint
ClusterIPInside the clusterService-to-service calls, internal APIs, databases exposed only to apps.Default and safest. Debug endpoints before blaming DNS.
NodePortEach node IP on a static high portLab exposure, external appliances, or as a lower-level building block.Opens every node. Usually avoid for direct production access.
LoadBalancerExternal or private cloud load balancerExpose a service through the cloud provider or MetalLB.Requires cloud controller, IAM, subnet tags, quotas, and health checks.
ExternalNameInside cluster DNS aliasPoint an in-cluster name at an external DNS name.No proxying or health checks; it is just DNS CNAME behavior.
HeadlessDirect Pod recordsStatefulSets, databases, custom client-side discovery.No virtual IP. Clients see individual Pod addresses.

Build From Scratch

This lab creates a namespace, a simple HTTP Deployment, and a ClusterIP Service. The point is to learn the relationship between labels, selectors, ports, and endpoints.

yamlweb-api-service-lab.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: app
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-api
  namespace: app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-api
  template:
    metadata:
      labels:
        app: web-api
    spec:
      containers:
        - name: web
          image: nginxdemos/hello:plain-text
          ports:
            - name: http
              containerPort: 80
          readinessProbe:
            httpGet:
              path: /
              port: http
            periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: web-api
  namespace: app
spec:
  type: ClusterIP
  selector:
    app: web-api
  ports:
    - name: http
      port: 80
      targetPort: http
bashverify-service-lab.sh
kubectl apply -f web-api-service-lab.yaml
kubectl rollout status deploy/web-api -n app

# See the Service and the Pods it should select.
kubectl get svc web-api -n app -o wide
kubectl get pods -n app -l app=web-api -o wide --show-labels

# EndpointSlice is the real backend list used by modern Kubernetes.
kubectl get endpointslice -n app -l kubernetes.io/service-name=web-api -o wide

# Test from inside the cluster.
kubectl run curl -n app --rm -it --image=curlimages/curl --restart=Never -- \
  curl -sS http://web-api.app.svc.cluster.local

Ports And Selectors

Most Service mistakes happen in two tiny fields: selector and targetPort. The selector must match Pod labels. The Service port is what clients call. The targetPort is where the container is listening.

FieldMeaningExample
selectorLabels used to find backend Pods.app: web-api
portService port clients connect to.80
targetPortPod container port, by name or number.http or 8080
containerPortDocumented port in the Pod spec. Useful for named targetPorts.name: http, containerPort: 8080
!
Common trapcontainerPort does not open a firewall by itself. Your app must actually listen on that port, and the Service targetPort must point to it.

Service DNS

Inside the cluster, CoreDNS gives Services predictable names. Pods in the same namespace can usually call http://web-api. Pods in another namespace should use web-api.app or the full name web-api.app.svc.cluster.local.

bashdns-checks.sh
kubectl run netshoot -n app --rm -it --image=nicolaka/netshoot -- /bin/bash

nslookup web-api
nslookup web-api.app
nslookup web-api.app.svc.cluster.local

curl -v http://web-api
curl -v http://web-api.app.svc.cluster.local

LoadBalancer Services

A LoadBalancer Service asks the infrastructure provider to create an external or internal load balancer. In AWS EKS, Azure AKS, GKE, OpenStack, or on-prem MetalLB, a controller watches the Service and provisions provider-specific resources.

yamlloadbalancer-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: web-api-public
  namespace: app
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
spec:
  type: LoadBalancer
  externalTrafficPolicy: Cluster
  selector:
    app: web-api
  ports:
    - name: http
      port: 80
      targetPort: http
SettingWhat It DoesTradeoff
externalTrafficPolicy: ClusterAny node can receive traffic and forward to any ready endpoint.Better distribution, but usually hides original client source IP.
externalTrafficPolicy: LocalNode only forwards to local endpoints.Preserves source IP, but nodes without local ready Pods fail LB health checks.
Provider annotationsControl scheme, type, target mode, health checks, SSL, subnets, security groups.Cloud-specific. Always check client platform standards.

Headless Services

A headless Service sets clusterIP: None. Instead of returning one virtual IP, DNS returns individual Pod records. This is common for StatefulSets where clients need stable identities such as mysql-0.mysql.data.svc.cluster.local.

yamlheadless-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: mysql
  namespace: data
spec:
  clusterIP: None
  selector:
    app: mysql
  ports:
    - name: mysql
      port: 3306
      targetPort: 3306

Production Defaults

  • 1Name ports: use names like http, grpc, and metrics so probes, Services, and NetworkPolicies stay readable.
  • 2Use readiness probes: only ready Pods should receive Service traffic.
  • 3Keep internal services internal: prefer ClusterIP unless external access is truly required.
  • 4Track ownership: check whether the Service is managed by Helm, ArgoCD, Flux, Terraform, or a cloud controller before patching.
  • 5Document cloud annotations: provider-specific annotations are operational behavior, not decoration.

Debugging Checklist

Debug Services from inside out. First prove the Pods are healthy, then prove the Service points at them, then test DNS, dataplane, network policy, and external load balancer behavior.

bashservice-debug.sh
NS=<namespace>
SVC=<service>
APP=<app-label-value>

# 1. Inspect the Service contract.
kubectl get svc "$SVC" -n "$NS" -o wide
kubectl describe svc "$SVC" -n "$NS"
kubectl get svc "$SVC" -n "$NS" -o yaml

# 2. Check selected Pods and readiness.
kubectl get pods -n "$NS" -l app="$APP" -o wide --show-labels
kubectl describe pods -n "$NS" -l app="$APP"

# 3. EndpointSlices should contain ready backend addresses.
kubectl get endpointslice -n "$NS" -l kubernetes.io/service-name="$SVC" -o wide
kubectl get endpointslice -n "$NS" -l kubernetes.io/service-name="$SVC" -o yaml

# 4. Test from inside the cluster.
kubectl run netshoot -n "$NS" --rm -it --image=nicolaka/netshoot -- /bin/bash
nslookup "$SVC.$NS.svc.cluster.local"
curl -vk "http://$SVC.$NS.svc.cluster.local:80"

kube-proxy And CNI Dataplanes

Traditional clusters use kube-proxy to program iptables or IPVS rules. Some CNIs, especially eBPF-based dataplanes such as Cilium, can replace kube-proxy. You do not need to memorize every implementation to troubleshoot well; prove the Kubernetes objects first, then inspect the active dataplane.

bashdataplane-checks.sh
kubectl get pods -n kube-system -o wide | grep -E 'kube-proxy|cilium|calico'

# kube-proxy clusters.
kubectl get daemonset -n kube-system kube-proxy
kubectl logs -n kube-system -l k8s-app=kube-proxy --tail=100

# CNI-specific checks vary by environment.
kubectl get pods -n kube-system -l k8s-app=cilium -o wide
kubectl get pods -n kube-system -l k8s-app=calico-node -o wide

Symptom To Cause

SymptomLikely CauseCheck First
Service has no endpointsSelector mismatch, Pods not Ready, wrong namespace.kubectl describe svc, kubectl get pods --show-labels, EndpointSlice.
DNS name does not resolveWrong namespace/name, CoreDNS issue, pod DNS config.nslookup from netshoot, CoreDNS pods/logs.
DNS resolves but curl times outNetworkPolicy, kube-proxy/CNI issue, app not listening, wrong targetPort.EndpointSlice, direct Pod IP curl, NetworkPolicy list.
Pod IP works but Service IP failsService port mapping or node dataplane problem.Service YAML, kube-proxy/CNI health.
LoadBalancer stuck PendingNo cloud LB support, missing controller, IAM, subnet tags, quota.Service events, cloud controller logs, provider docs.
External traffic reaches only some PodsexternalTrafficPolicy: Local with uneven Pods across nodes.Pod placement, LB target health, node-local endpoints.
Client source IP missingexternalTrafficPolicy: Cluster or proxy/LB behavior.Service spec and load balancer settings.

Safe Change Pattern

  1. Confirm ownership with labels and annotations such as app.kubernetes.io/managed-by, Helm release metadata, or ArgoCD labels.
  2. Fix the source of truth: Helm values, GitOps manifests, Terraform, or platform module.
  3. Use kubectl diff, Helm template output, ArgoCD diff, or Terraform plan before applying.
  4. After rollout, verify Service endpoints, DNS, in-cluster curl, and any external health checks.