KUBERNETES
HANDBOOK
A complete reference covering Kubernetes architecture, workloads, networking, ingress/egress, horizontal and vertical scaling, storage, and every essential kubectl command from deployment to cleanup. Built for engineers coming from Docker.
What is Kubernetes?
Kubernetes (K8s) is an open-source container orchestration platform originally developed by Google, now maintained by the CNCF. It automates deploying, scaling, and managing containerized applications across a cluster of machines. Where Docker runs a single container on a single host, Kubernetes runs thousands of containers across hundreds of nodes — reliably, with self-healing, autoscaling, and zero-downtime updates.
Kubernetes vs Docker
Docker and Kubernetes are complementary, not competing. Docker builds and runs containers on a single host. Kubernetes orchestrates those containers across many hosts. Think of Docker as the engine and Kubernetes as the fleet management system.
| Dimension | Docker (standalone) | Kubernetes |
|---|---|---|
| Scope | Single host | Multi-node cluster |
| Unit of work | Container | Pod (1+ containers) |
| Scaling | Manual (docker run again) | Automatic (HPA, KEDA) |
| Self-healing | ❌ Manual restart | ✅ Automatic via ReplicaSet |
| Load balancing | Manual (nginx config) | Built-in via Service |
| Rolling updates | Manual scripting | Built-in, configurable |
| Service discovery | Manual DNS / links | Built-in DNS (CoreDNS) |
| Config management | Environment vars / bind mounts | ConfigMaps + Secrets |
| Networking | Docker networks (single host) | CNI plugins (cluster-wide) |
| Multi-tenant | ❌ | ✅ Namespaces + RBAC |
| Complexity | Low — simple to start | High — many concepts |
| Use when | Dev, single-server, simple apps | Production, microservices, scale |
# docker-compose.yml services: api: image: myapp:latest ports: ["8080:8080"] environment: DB_HOST: db db: image: postgres:15 volumes: ["data:/var/lib/postgresql"] # docker compose up -d # No healing, no real scaling, # single machine only
# deployment.yaml kind: Deployment spec: replicas: 3 # 3 pods, any node selector: matchLabels: app: api template: spec: containers: - image: myapp:latest # Self-heals, scales, deploys # across multiple machines
K8s has real operational overhead. For a simple single-service app, a VPS with Docker Compose or a managed platform (Render, Railway, Fly.io) is often better. Use Kubernetes when you have multiple services that need independent scaling, high availability requirements, or a team that can manage the infrastructure.
Cluster Architecture
A Kubernetes cluster has two types of machines: the Control Plane (brain — manages state) and Worker Nodes (muscle — run workloads). In managed services like GKE, EKS, and AKS, the control plane is managed for you.
┌─────────────────────────────────────────────────────────────────────┐ │ CONTROL PLANE (Master Node) │ │ │ │ ┌─────────────────┐ ┌──────────────┐ ┌────────────────────────┐ │ │ │ API Server │ │ Scheduler │ │ Controller Manager │ │ │ │ (kube-apiserver)│ │ (kube-sched)│ │ ReplicaSet, Deploy.. │ │ │ └─────────────────┘ └──────────────┘ └────────────────────────┘ │ │ ┌─────────────────┐ ┌──────────────────────────────────────────┐ │ │ │ etcd │ │ Cloud Controller Manager (GKE/EKS/AKS) │ │ │ │ (key-val store) │ │ (provisions LBs, volumes, nodes) │ │ │ └─────────────────┘ └──────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────────┘ │ kubectl / API calls │ ▼ ▼ ┌────────────────────┐ ┌─────────────────────┐ │ WORKER NODE 1 │ │ WORKER NODE 2 │ │ ┌──────────────┐ │ │ ┌──────────────┐ │ │ │ kubelet │ │ │ │ kubelet │ │ │ │ (node agent)│ │ │ │ (node agent)│ │ │ └──────────────┘ │ │ └──────────────┘ │ │ ┌──────────────┐ │ │ ┌──────────────┐ │ │ │ kube-proxy │ │ │ │ kube-proxy │ │ │ │ (networking)│ │ │ │ (networking)│ │ │ └──────────────┘ │ │ └──────────────┘ │ │ ┌──────────────┐ │ │ ┌──────────────┐ │ │ │ Container │ │ │ │ Container │ │ │ │ Runtime │ │ │ │ Runtime │ │ │ │ (containerd)│ │ │ │ (containerd)│ │ │ └──────────────┘ │ │ └──────────────┘ │ │ [ POD ][ POD ] │ │ [ POD ][ POD ] │ └────────────────────┘ └─────────────────────┘
Control Plane Components
| Component | Role |
|---|---|
| kube-apiserver | Single entry point for all cluster operations. Every kubectl command hits this. Validates and processes requests. |
| etcd | Distributed key-value store. The single source of truth — stores all cluster state, config, secrets. |
| kube-scheduler | Watches for unscheduled pods and assigns them to nodes based on resources, affinity, taints/tolerations. |
| kube-controller-manager | Runs controller loops — ReplicaSet controller, Deployment controller, Job controller, etc. |
| cloud-controller-manager | Integrates with cloud APIs — creates load balancers, persistent disks, and node pools on GKE/EKS/AKS. |
Core Objects
Everything in Kubernetes is an object — a persistent entity with a desired state. You declare what you want in YAML, kubectl sends it to the API server, and Kubernetes controllers reconcile actual state to match desired state.
Every K8s Object Has These Fields
apiVersion: apps/v1 # API group + version (apps/v1, v1, networking.k8s.io/v1) kind: Deployment # Object type metadata: name: my-app # Unique name within namespace namespace: production # Logical grouping labels: # Key-value tags — used for selection app: my-app version: v2 team: backend annotations: # Non-identifying metadata (for tools/humans) deployment.kubernetes.io/revision: "3" description: "Main API service" spec: # DESIRED STATE — what you want ... status: # ACTUAL STATE — managed by K8s (read-only) ...
Pods in Depth
A Pod is the smallest deployable unit. It wraps one or more containers that share a network namespace (same IP, same localhost) and optional shared storage volumes. Containers in a pod are scheduled together on the same node.
# Single container pod (basic) apiVersion: v1 kind: Pod metadata: name: my-api-pod labels: app: my-api spec: containers: - name: api image: myrepo/my-api:v2.1.0 ports: - containerPort: 8080 env: - name: APP_ENV value: production resources: requests: cpu: "250m" # 250 millicores = 0.25 vCPU memory: "256Mi" limits: cpu: "500m" memory: "512Mi" # Multi-container pod — sidecar pattern spec: containers: - name: app image: myapp:latest volumeMounts: - name: shared-logs mountPath: /var/log/app - name: log-shipper # Sidecar container — same pod image: fluentd:v1.16 volumeMounts: - name: shared-logs mountPath: /var/log/app volumes: - name: shared-logs emptyDir: {} # Shared ephemeral volume
Bare pods are not restarted if they die or the node fails. Always use a Deployment (stateless apps), StatefulSet (stateful apps), DaemonSet (one per node), or Job (batch) to manage pods. The controller ensures the desired number of pods is always running.
Deployments Complete Reference
apiVersion: apps/v1 kind: Deployment metadata: name: api-deployment namespace: production labels: app: api version: v2 spec: replicas: 3 # Desired pod count selector: matchLabels: app: api # Must match template labels strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 # Max extra pods during update maxUnavailable: 0 # Zero downtime — never kill before ready template: metadata: labels: app: api version: v2 spec: terminationGracePeriodSeconds: 30 # Time for graceful shutdown containers: - name: api image: gcr.io/my-project/api:v2.1.0 imagePullPolicy: Always ports: - containerPort: 8080 protocol: TCP envFrom: - configMapRef: name: api-config # All ConfigMap keys as env vars - secretRef: name: api-secrets # All Secret keys as env vars env: - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name # Downward API resources: requests: cpu: "250m" memory: "256Mi" limits: cpu: "1000m" memory: "512Mi" livenessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 15 periodSeconds: 10 failureThreshold: 3 readinessProbe: httpGet: path: /ready port: 8080 initialDelaySeconds: 5 periodSeconds: 5 affinity: podAntiAffinity: # Spread pods across nodes preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchLabels: app: api topologyKey: kubernetes.io/hostname
ReplicaSets Explained
A ReplicaSet ensures that a specified number of pod replicas are running at any time. If a pod dies, the ReplicaSet creates a replacement. If there are too many pods, it deletes the excess. Deployments manage ReplicaSets for you — providing versioning and rollback on top.
During a rolling update, Deployment creates a new ReplicaSet (v2) while scaling down the old one (v1). Old ReplicaSets are kept for rollback history (controlled by revisionHistoryLimit).
StatefulSets & DaemonSets
apiVersion: apps/v1 kind: StatefulSet metadata: name: postgres spec: serviceName: postgres-headless replicas: 3 selector: matchLabels: app: postgres template: spec: containers: - name: postgres image: postgres:16 volumeClaimTemplates: # Each pod gets its own PVC - metadata: name: data spec: accessModes: [ReadWriteOnce] resources: requests: storage: 10Gi # Pods: postgres-0, postgres-1, postgres-2 # Each gets stable DNS: postgres-0.postgres-headless
apiVersion: apps/v1 kind: DaemonSet metadata: name: log-collector spec: selector: matchLabels: app: log-collector template: spec: containers: - name: fluentd image: fluentd:v1.16 volumeMounts: - name: varlog mountPath: /var/log volumes: - name: varlog hostPath: path: /var/log # Runs on EVERY worker node # Also used for: monitoring agents (Prometheus # node-exporter), CNI plugins, security agents
Jobs & CronJobs
# Job — run to completion once apiVersion: batch/v1 kind: Job metadata: name: db-migration spec: backoffLimit: 3 # Retry failed pods up to 3 times ttlSecondsAfterFinished: 3600 # Auto-delete 1hr after completion template: spec: restartPolicy: OnFailure containers: - name: migrate image: myapp:latest command: ["dotnet", "ef", "database", "update"] # CronJob — scheduled jobs apiVersion: batch/v1 kind: CronJob metadata: name: nightly-report spec: schedule: "0 2 * * *" # 2 AM every night (cron syntax) concurrencyPolicy: Forbid # Don't start if previous run still going successfulJobsHistoryLimit: 3 failedJobsHistoryLimit: 1 jobTemplate: spec: template: spec: restartPolicy: OnFailure containers: - name: reporter image: myapp:latest command: ["python", "generate_report.py"]
Namespaces
Namespaces provide virtual clusters within a physical cluster. They isolate resources by environment, team, or application. Resource names must be unique within a namespace but can repeat across namespaces.
# Create namespace kubectl create namespace production kubectl create namespace staging # YAML approach kubectl apply -f - <<EOF apiVersion: v1 kind: Namespace metadata: name: production labels: env: production team: backend EOF # Work within a namespace kubectl get pods -n production kubectl get all -n staging # Set default namespace for current context kubectl config set-context --current --namespace=production # List all namespaces kubectl get namespaces # Get resources across ALL namespaces kubectl get pods -A kubectl get deployments --all-namespaces
| Built-in Namespace | Purpose |
|---|---|
default | Resources with no namespace specified end up here. Avoid using for production workloads. |
kube-system | Kubernetes system components: CoreDNS, kube-proxy, metrics-server, CNI plugins. |
kube-public | Publicly readable. Contains cluster-info ConfigMap. Rarely used by applications. |
kube-node-lease | Node heartbeat objects. Used internally for node health tracking. |
ConfigMaps & Usage Patterns
# Define ConfigMap apiVersion: v1 kind: ConfigMap metadata: name: api-config data: APP_ENV: production LOG_LEVEL: info MAX_CONNECTIONS: "100" app.conf: | # Multi-line file content server.port=8080 server.host=0.0.0.0 db.pool.size=20 --- # Inject as individual env var env: - name: APP_ENV valueFrom: configMapKeyRef: name: api-config key: APP_ENV # Inject ALL keys as env vars envFrom: - configMapRef: name: api-config # Mount as files in container volumeMounts: - name: config-vol mountPath: /app/config volumes: - name: config-vol configMap: name: api-config items: - key: app.conf path: app.conf # Mounts as /app/config/app.conf
Secrets
# From literal values (kubectl encodes to base64 automatically) kubectl create secret generic db-credentials \ --from-literal=username=admin \ --from-literal=password='S3cr3t!Pass' # From file (e.g., TLS certificate) kubectl create secret tls tls-secret \ --cert=cert.pem --key=key.pem # From env file kubectl create secret generic app-secrets \ --from-env-file=.env.prod
apiVersion: v1 kind: Secret metadata: name: db-credentials type: Opaque data: username: YWRtaW4= # base64("admin") password: UzNjcjN0IVBhc3M= # base64("S3cr3t!Pass") # stringData: allows plain text (K8s encodes it) stringData: api_key: my-plain-text-api-key
Native K8s Secrets are only base64-encoded, not encrypted by default in etcd. For production: enable etcd encryption at rest, and use external secrets managers: External Secrets Operator (syncs from AWS Secrets Manager / GCP Secret Manager / HashiCorp Vault), Sealed Secrets (encrypted in Git), or CSI Secret Store.
Resource Requests & Limits
Requests = what the scheduler guarantees the pod. Used to decide which node to place the pod on. Limits = the max the pod can consume. CPU is throttled at the limit; exceeding memory limit kills the container (OOMKilled).
| Setting | CPU Behavior | Memory Behavior |
|---|---|---|
| requests | Guaranteed minimum, used for scheduling | Guaranteed minimum, used for scheduling |
| limits | Throttled (not killed) if exceeded | Process killed (OOMKilled) if exceeded |
| no limits set | Can use all node CPU (noisy neighbor) | Can OOMKill other pods on node |
resources: requests: cpu: "250m" # 250 millicores = 0.25 vCPU memory: "256Mi" # 256 Mebibytes limits: cpu: "1" # 1 full vCPU (= 1000m) memory: "1Gi" # 1 Gibibyte # CPU units: # 1 = 1 vCPU / 1 core / 1 AWS vCPU / 1 hyperthread # 0.5 = 500m (half a core) # 100m = 0.1 core (1/10 of a core) # Memory units: # Ki = Kibibyte (1024 bytes) # Mi = Mebibyte (1024 Ki) # Gi = Gibibyte (1024 Mi) # K, M, G = decimal units (avoid — use Ki/Mi/Gi) # LimitRange — default limits for a namespace apiVersion: v1 kind: LimitRange metadata: name: default-limits namespace: production spec: limits: - type: Container default: cpu: "500m" memory: "256Mi" defaultRequest: cpu: "100m" memory: "128Mi"
Services — Network Abstraction
A Service is a stable network endpoint for a set of pods. Because pod IPs change constantly (pods die and restart), you never address pods directly. A Service has a fixed ClusterIP and DNS name that routes to healthy pods via kube-proxy.
| Type | Reachable From | Use Case |
|---|---|---|
| ClusterIP | Within cluster only | Internal microservice communication. Default type. |
| NodePort | Outside via node IP:port | Dev/testing. Exposes port 30000-32767 on every node. |
| LoadBalancer | Public internet | Production. Creates a cloud LB (AWS NLB, GCP Network LB). One IP per service — expensive. |
| ExternalName | Within cluster | Alias for external DNS (e.g., point to RDS hostname). |
| Headless | Direct pod IPs | StatefulSet stable pod DNS. Set clusterIP: None. |
# ClusterIP — internal only (default) apiVersion: v1 kind: Service metadata: name: api-service namespace: production spec: type: ClusterIP selector: app: api # Routes to pods with this label ports: - name: http protocol: TCP port: 80 # Service port (what clients call) targetPort: 8080 # Pod port (what container listens on) --- # LoadBalancer — public internet access apiVersion: v1 kind: Service metadata: name: api-lb annotations: service.beta.kubernetes.io/aws-load-balancer-type: nlb spec: type: LoadBalancer selector: app: api ports: - port: 443 targetPort: 8080 --- # Headless — for StatefulSet pod-level DNS apiVersion: v1 kind: Service metadata: name: postgres-headless spec: clusterIP: None # No VIP — direct pod IPs selector: app: postgres
Ingress — HTTP Routing
Ingress is the K8s-native way to route HTTP/HTTPS traffic from outside the cluster to Services inside. It acts as a Layer 7 reverse proxy, routing by hostname and path. It requires an Ingress Controller to be installed in the cluster (nginx-ingress, Traefik, HAProxy, GKE's built-in, AWS ALB Controller).
Internet
│
▼
┌─────────────────────────────────────────────────────┐
│ LoadBalancer Service │
│ (cloud LB — single external IP/DNS) │
└─────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────┐
│ Ingress Controller Pod │
│ (nginx / traefik / alb — reads Ingress objects) │
│ │
│ Rule: api.myapp.com/v1 → api-service:80 │
│ Rule: api.myapp.com/v2 → api-v2-service:80 │
│ Rule: admin.myapp.com → admin-service:80 │
│ Rule: * → frontend-service:80 │
└─────────────────────────────────────────────────────┘
│ │ │
▼ ▼ ▼
api-service api-v2-service admin-service
(ClusterIP) (ClusterIP) (ClusterIP)
│
▼
[Pod][Pod][Pod]
apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: main-ingress namespace: production annotations: kubernetes.io/ingress.class: nginx nginx.ingress.kubernetes.io/rewrite-target: / nginx.ingress.kubernetes.io/ssl-redirect: "true" nginx.ingress.kubernetes.io/rate-limit: "100" cert-manager.io/cluster-issuer: letsencrypt-prod # Auto TLS spec: ingressClassName: nginx tls: - hosts: - api.myapp.com - admin.myapp.com secretName: myapp-tls # cert-manager populates this rules: - host: api.myapp.com http: paths: - path: /v1 pathType: Prefix backend: service: name: api-v1-service port: number: 80 - path: /v2 pathType: Prefix backend: service: name: api-v2-service port: number: 80 - host: admin.myapp.com http: paths: - path: / pathType: Prefix backend: service: name: admin-service port: number: 80
Install nginx Ingress Controller
# Helm (recommended) helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx helm install ingress-nginx ingress-nginx/ingress-nginx \ --namespace ingress-nginx --create-namespace # Or kubectl apply kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.10.1/deploy/static/provider/cloud/deploy.yaml # Verify kubectl get pods -n ingress-nginx kubectl get svc -n ingress-nginx # Shows external IP
Egress & NetworkPolicy
NetworkPolicy is K8s's firewall for pod-to-pod and pod-to-external traffic. By default, all pods can talk to all other pods. A NetworkPolicy restricts this using label selectors. Requires a CNI plugin that supports NetworkPolicy (Calico, Cilium, Weave — AWS VPC CNI + Calico on EKS, Dataplane V2 on GKE).
In NetworkPolicy, ingress = traffic coming into a pod, and egress = traffic going out of a pod. This is different from the Ingress object which routes external HTTP traffic. Be careful not to confuse these two uses of the word.
# Step 1: Deny all ingress + egress in a namespace (default deny) apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: default-deny-all namespace: production spec: podSelector: {} # Applies to ALL pods in namespace policyTypes: - Ingress - Egress --- # Step 2: Allow api-service to receive from frontend only apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-frontend-to-api namespace: production spec: podSelector: matchLabels: app: api # Policy applies to api pods policyTypes: - Ingress ingress: - from: - podSelector: matchLabels: app: frontend # Only allow from frontend pods - namespaceSelector: matchLabels: name: production # In production namespace only ports: - protocol: TCP port: 8080 --- # Step 3: Allow api pods EGRESS to database + external DNS apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: api-egress-rules namespace: production spec: podSelector: matchLabels: app: api policyTypes: - Egress egress: - to: # Allow to database pods - podSelector: matchLabels: app: postgres ports: - port: 5432 - to: # Allow DNS (kube-dns) - namespaceSelector: matchLabels: kubernetes.io/metadata.name: kube-system ports: - port: 53 protocol: UDP - port: 53 protocol: TCP - to: # Allow external HTTPS (payment API) - ipBlock: cidr: 0.0.0.0/0 except: - 10.0.0.0/8 # Block internal RFC1918 - 172.16.0.0/12 - 192.168.0.0/16 ports: - port: 443
DNS & Service Discovery
Kubernetes runs CoreDNS as the cluster DNS server. Every Service gets a DNS name automatically. Pods find services by name — no hardcoded IPs.
# Full DNS name format: {service-name}.{namespace}.svc.cluster.local # Examples: api-service.production.svc.cluster.local → api service in production namespace postgres.production.svc.cluster.local → postgres service api-service.default.svc.cluster.local → api in default namespace # Shorthand within SAME namespace: api-service → resolved to api-service.production.svc.cluster.local postgres:5432 → database connection string # StatefulSet pod DNS (headless service required): postgres-0.postgres-headless.production.svc.cluster.local postgres-1.postgres-headless.production.svc.cluster.local postgres-2.postgres-headless.production.svc.cluster.local
Horizontal Pod Autoscaler (HPA)
HPA automatically increases or decreases the number of pod replicas based on observed metrics (CPU, memory, or custom metrics via metrics-server / KEDA). It scales out (more pods) and in (fewer pods) — not the pod size itself.
HPA Controller Loop (every 15s)
│
┌──────────────────────┼──────────────────────┐
│ ▼ │
│ metrics-server / Prometheus │
│ (CPU: 78% across 3 pods) │
│ │ │
│ ▼ │
│ desiredReplicas = ceil(3 × 78/50) = 5 │
│ │ │
│ ▼ │
│ Scale Deployment to 5 replicas │
└──────────────────────────────────────────────┘
Scale In: CPU drops → scale down (5min cooldown by default)
Scale Out: CPU rises → scale up immediately
apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: api-hpa namespace: production spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: api-deployment # Which deployment to scale minReplicas: 2 # Never go below 2 (HA guarantee) maxReplicas: 20 # Never exceed 20 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50 # Scale when CPU avg > 50% - type: Resource resource: name: memory target: type: AverageValue averageValue: 400Mi # Scale when avg mem > 400 Mi - type: Pods # Custom metric from Prometheus adapter pods: metric: name: http_requests_per_second target: type: AverageValue averageValue: "100" # 100 RPS per pod behavior: scaleDown: stabilizationWindowSeconds: 300 # Wait 5min before scale-down policies: - type: Pods value: 2 periodSeconds: 60 # Max 2 pods removed per minute scaleUp: stabilizationWindowSeconds: 0 # Scale up immediately policies: - type: Percent value: 100 periodSeconds: 15 # Double pods every 15 seconds
# Create HPA imperatively (quick) kubectl autoscale deployment api-deployment \ --cpu-percent=50 --min=2 --max=20 # Check HPA status kubectl get hpa -n production kubectl describe hpa api-hpa -n production # Watch HPA in action kubectl get hpa api-hpa -n production -w # Install metrics-server (required for CPU/memory HPA) kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml # Check current pod resource usage kubectl top pods -n production kubectl top nodes
Vertical Pod Autoscaler (VPA)
VPA automatically adjusts the CPU and memory requests and limits for individual pods based on actual usage. Instead of adding more pods (HPA), it makes each pod bigger or smaller. This is vertical scaling — giving a pod more (or less) resources.
Before: 2 pods × 256Mi each [Pod] [Pod] After (high load): [Pod] [Pod] [Pod] [Pod] 4 pods × 256Mi = 1 GB total ✅ Zero downtime scale-out ✅ Fast response to load spikes ✅ Stateless apps ❌ Each pod stays same size
Before: 1 pod × 256Mi [Pod 256Mi] After VPA recommendation: [Pod 768Mi] 1 pod × 768Mi (right-sized) ⚠️ Pod restart required (eviction) ✅ Stateful apps / batch jobs ✅ Right-sizing (save costs) ❌ Don't use HPA+VPA on same metric
apiVersion: autoscaling.k8s.io/v1 kind: VerticalPodAutoscaler metadata: name: api-vpa namespace: production spec: targetRef: apiVersion: apps/v1 kind: Deployment name: api-deployment updatePolicy: updateMode: "Auto" # Off | Initial | Recreate | Auto # Off = only recommendations, no changes # Initial = set only on new pods # Recreate = evict and recreate pods to apply # Auto = evict and recreate (default for production) resourcePolicy: containerPolicies: - containerName: api minAllowed: cpu: 100m memory: 128Mi maxAllowed: cpu: 2000m memory: 4Gi controlledResources: ["cpu", "memory"]
# Install VPA (from GitHub) kubectl apply -f https://github.com/kubernetes/autoscaler/raw/master/vertical-pod-autoscaler/deploy/vpa-v1-crd-gen.yaml # View VPA recommendations kubectl describe vpa api-vpa -n production # Look for: # Recommendation: # Container Recommendations: # Container Name: api # Lower Bound: cpu: 50m, memory: 128Mi # Target: cpu: 240m, memory: 400Mi ← apply this # Upper Bound: cpu: 800m, memory: 1Gi # Get VPA objects kubectl get vpa -n production
Cluster Autoscaler
The Cluster Autoscaler (CA) adds or removes nodes from the cluster based on pending pods and node utilization. It works at the node pool level. HPA/VPA scale pods; CA scales nodes.
| Scaler | What it scales | Trigger |
|---|---|---|
| HPA | Pod replicas (count) | CPU / memory / custom metrics |
| VPA | Pod resources (size) | Historical resource usage |
| Cluster Autoscaler | Node count | Unschedulable pods / underutilized nodes |
| KEDA | Pod replicas (to 0) | Events (queue depth, HTTP requests, cron) |
# Enable when creating cluster gcloud container clusters create my-cluster \ --enable-autoscaling \ --min-nodes=1 --max-nodes=10 \ --zone=us-central1-a # Enable on existing node pool gcloud container clusters update my-cluster \ --enable-autoscaling \ --min-nodes=2 --max-nodes=20 \ --node-pool=default-pool # Add annotation to keep pod on specific node during scale-down kubectl annotate pod my-pod \ cluster-autoscaler.kubernetes.io/safe-to-evict=false
Volumes & PersistentVolumeClaims
# PersistentVolumeClaim — request storage apiVersion: v1 kind: PersistentVolumeClaim metadata: name: postgres-pvc namespace: production spec: accessModes: - ReadWriteOnce # RWO: single node R/W # ROX: many nodes read-only # RWX: many nodes R/W (NFS) storageClassName: standard-rwo resources: requests: storage: 20Gi --- # Use PVC in a Deployment spec: containers: - name: postgres image: postgres:16 volumeMounts: - name: data mountPath: /var/lib/postgresql/data volumes: - name: data persistentVolumeClaim: claimName: postgres-pvc # Reference the PVC above
Storage Classes
| Cloud | StorageClass | Type | Use |
|---|---|---|---|
| GKE | standard-rwo | Balanced PD | General purpose |
| GKE | premium-rwo | SSD PD | High IOPS (databases) |
| EKS | gp3 | EBS gp3 | General purpose |
| EKS | io1 | EBS io1 | High IOPS provisioned |
| AKS | managed | Azure Disk | General purpose |
| Any | Custom | NFS / Ceph / Longhorn | RWX shared volumes |
Deploy Workflow
k8s/ directory in your repo. Use Kustomize or Helm for environment-specific values.kubectl apply -f k8s/ — declarative apply. Kubernetes computes the diff and applies only changes.kubectl rollout status deployment/api-deployment — watches until all pods are updated and ready.kubectl get pods -w to watch status in real time.# Apply single file kubectl apply -f deployment.yaml # Apply all files in directory kubectl apply -f k8s/ # Apply with recursive subdirectories kubectl apply -f k8s/ --recursive # Dry run — see what would change kubectl apply -f deployment.yaml --dry-run=client kubectl apply -f deployment.yaml --dry-run=server # More accurate # Diff — show changes before applying kubectl diff -f deployment.yaml # Update image (triggers rolling update) kubectl set image deployment/api-deployment \ api=gcr.io/my-project/api:v2.2.0 # Scale replicas manually kubectl scale deployment api-deployment --replicas=5 # Watch rollout progress kubectl rollout status deployment/api-deployment -n production -w # See rollout history kubectl rollout history deployment/api-deployment
Rolling Updates
A rolling update replaces old pods with new pods incrementally, ensuring the application stays available throughout. Controlled by maxSurge (extra pods allowed during update) and maxUnavailable (pods that can be down).
strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 # Allow 1 extra pod (4 running during update on 3 replicas) maxUnavailable: 0 # Never have less than 3 ready pods (zero downtime) # High-availability config (more surge, faster update): # maxSurge: 25% # maxUnavailable: 25% # Zero-downtime absolute guarantee: # maxSurge: 1 # maxUnavailable: 0
Initial: [v1] [v1] [v1] (3 running) Step 1: [v1] [v1] [v1] [v2] (maxSurge=1, create 1 new) Step 2: [v1] [v1] [v2] (v2 ready, kill 1 old) Step 3: [v1] [v1] [v2] [v2] (create another new) Step 4: [v1] [v2] [v2] (kill another old) Step 5: [v1] [v2] [v2] [v2] (create last new) Final: [v2] [v2] [v2] (kill last old)
Rollbacks
# View rollout history (shows revision numbers) kubectl rollout history deployment/api-deployment -n production # REVISION CHANGE-CAUSE # 1 Initial deploy # 2 kubectl set image ... api=v2.0 # 3 kubectl set image ... api=v2.1 ← current # View details of a specific revision kubectl rollout history deployment/api-deployment --revision=2 # Rollback to immediately previous revision kubectl rollout undo deployment/api-deployment -n production # Rollback to specific revision kubectl rollout undo deployment/api-deployment --to-revision=1 # Pause a rollout (to manually verify) kubectl rollout pause deployment/api-deployment # Resume paused rollout kubectl rollout resume deployment/api-deployment # Keep revision history (default: 10) # Set in deployment spec: # revisionHistoryLimit: 5 # Annotate deployment (shows in history CHANGE-CAUSE) kubectl annotate deployment api-deployment \ kubernetes.io/change-cause="Deploy v2.1.0 - fix memory leak"
Health Probes
Kubernetes uses three types of probes to determine pod health. Getting these right is critical for zero-downtime deployments and self-healing.
| Probe | Failure Action | Use For |
|---|---|---|
| livenessProbe | Kills and restarts the container | Detect deadlocks. If app is stuck but not crashed. |
| readinessProbe | Removes pod from Service endpoint (stops traffic) | Is the app ready to receive requests? DB connected? |
| startupProbe | Kills container if it doesn't start in time | Slow-starting apps (Java, .NET). Disables liveness until startup. |
# HTTP probe (most common — checks /health endpoint) livenessProbe: httpGet: path: /health/live port: 8080 httpHeaders: - name: X-Health-Check value: kubernetes initialDelaySeconds: 20 # Wait 20s before first probe periodSeconds: 10 # Probe every 10s timeoutSeconds: 5 # Fail after 5s no response failureThreshold: 3 # Kill after 3 consecutive failures successThreshold: 1 readinessProbe: httpGet: path: /health/ready port: 8080 initialDelaySeconds: 5 periodSeconds: 5 failureThreshold: 2 # Remove from LB after 2 failures (fast) startupProbe: httpGet: path: /health/live port: 8080 failureThreshold: 30 # 30 × 10s = 5min to start periodSeconds: 10 --- # TCP probe (for databases, non-HTTP services) livenessProbe: tcpSocket: port: 5432 initialDelaySeconds: 30 periodSeconds: 10 # Exec probe (run command inside container) livenessProbe: exec: command: - /bin/sh - -c - "redis-cli ping | grep PONG" initialDelaySeconds: 5 periodSeconds: 10 # gRPC probe (.NET 8+ with gRPC health check) livenessProbe: grpc: port: 8080 service: grpc.health.v1.Health
Common kubectl Commands
# Get resources — basic kubectl get pods kubectl get pods -n production kubectl get pods -A # All namespaces kubectl get pods -o wide # Shows node, IP kubectl get pods -l app=api # Label selector kubectl get pods --field-selector status.phase=Running # Get with output formats kubectl get deployment api-deployment -o yaml # Full YAML spec kubectl get deployment api-deployment -o json # Full JSON kubectl get pods -o jsonpath='{.items[*].metadata.name}' # Specific field kubectl get pods -o custom-columns=NAME:.metadata.name,STATUS:.status.phase # Get multiple resource types kubectl get pods,services,deployments -n production # Watch (live updates) kubectl get pods -w kubectl get pods -n production -w
kubectl describe pod api-pod-abc123 kubectl describe deployment api-deployment -n production kubectl describe node my-node kubectl describe service api-service kubectl describe pvc postgres-pvc
# Basic logs kubectl logs pod-name kubectl logs pod-name -n production kubectl logs pod-name -c container-name # Specific container in multi-container pod # Stream live logs kubectl logs -f pod-name kubectl logs -f deployment/api-deployment # From deployment (picks one pod) # Previous container instance (after crash) kubectl logs pod-name --previous # Tail last N lines kubectl logs pod-name --tail=100 kubectl logs pod-name --since=1h # Last 1 hour kubectl logs pod-name --since-time=2025-01-01T00:00:00Z # Logs from all pods with label (requires stern or manual loop) kubectl logs -l app=api --prefix=true --all-containers=true -f
# Open interactive shell kubectl exec -it pod-name -- /bin/bash kubectl exec -it pod-name -- /bin/sh # If bash not available kubectl exec -it pod-name -c container-name -- /bin/bash # Run single command kubectl exec pod-name -- env kubectl exec pod-name -- cat /etc/config/app.conf kubectl exec pod-name -- curl http://localhost:8080/health # Copy files to/from pod kubectl cp pod-name:/app/logs/error.log ./error.log kubectl cp ./config.json pod-name:/app/config.json
# Forward localhost:8080 → pod:8080 kubectl port-forward pod/api-pod-abc123 8080:8080 # Forward to service (recommended — picks a healthy pod) kubectl port-forward service/api-service 8080:80 -n production # Forward to deployment kubectl port-forward deployment/api-deployment 8080:8080 # Multiple ports kubectl port-forward service/api-service 8080:80 8443:443 # Access Kubernetes Dashboard kubectl port-forward -n kubernetes-dashboard service/kubernetes-dashboard 8443:443
# Declarative (preferred — idempotent) kubectl apply -f manifest.yaml kubectl apply -f k8s/ --recursive # Imperative creates (good for quick testing) kubectl create deployment nginx --image=nginx:latest --replicas=2 kubectl create service clusterip my-svc --tcp=80:8080 kubectl create configmap app-config --from-literal=ENV=prod --from-file=config.properties kubectl create secret generic db-creds --from-literal=pw=secret123 # Edit live object in editor kubectl edit deployment api-deployment -n production # Patch — surgical update without full YAML kubectl patch deployment api-deployment \ --patch '{"spec":{"replicas":5}}' kubectl patch deployment api-deployment \ --type=json \ -p='[{"op":"replace","path":"/spec/template/spec/containers/0/image","value":"myapp:v3"}]' # Label / Annotate kubectl label pod my-pod environment=production kubectl annotate deployment api-deployment version="2.1.0"
Debug Commands
# Debug pod with ephemeral container (K8s 1.23+) kubectl debug -it pod-name --image=busybox:latest --target=app # Debug with a copy of the pod (preserves original) kubectl debug pod-name --copy-to=pod-name-debug --image=ubuntu # Describe events for a pod (most useful for scheduling issues) kubectl describe pod pod-name | grep -A 20 Events # Get all events in namespace sorted by time kubectl get events -n production --sort-by=.metadata.creationTimestamp # Get only Warning events kubectl get events -n production --field-selector type=Warning # Check resource usage kubectl top pods -n production kubectl top pods -n production --sort-by=memory kubectl top nodes # Check why pod is pending kubectl describe pod pending-pod | grep -A 10 "Events:" # Common causes: Insufficient CPU/Memory, No nodes match selector, # PVC not bound, Image pull error # Check node conditions kubectl describe node my-node | grep -A 10 "Conditions:" # View API server audit logs kubectl get events --all-namespaces | grep Warning | head -20 # Validate YAML manifest kubectl apply -f manifest.yaml --dry-run=server --validate=true
Cleanup Commands
# Delete by name kubectl delete pod my-pod kubectl delete deployment api-deployment -n production kubectl delete service api-service -n production # Delete by file (opposite of apply) kubectl delete -f deployment.yaml kubectl delete -f k8s/ --recursive # Delete by label selector kubectl delete pods -l app=api -n production kubectl delete all -l app=my-app -n production # Delete all pods in namespace (they will restart if managed by Deployment) kubectl delete pods --all -n production # Delete all resources in namespace (use with caution!) kubectl delete all --all -n staging # Note: "all" doesn't include PVCs, Secrets, ConfigMaps # Delete EVERYTHING in namespace including PVCs, ConfigMaps, Secrets kubectl delete namespace staging # ⚠️ DESTRUCTIVE - deletes everything inside # Force delete stuck pod (last resort — may cause split-brain) kubectl delete pod stuck-pod --force --grace-period=0 # Delete completed/failed jobs kubectl delete jobs --field-selector status.successful=1 kubectl delete pods --field-selector status.phase=Succeeded kubectl delete pods --field-selector status.phase=Failed # Delete evicted pods kubectl get pods -A | grep Evicted | awk '{print $2 " -n " $1}' | xargs kubectl delete pod # Drain a node (for maintenance) kubectl drain my-node --ignore-daemonsets --delete-emptydir-data kubectl cordon my-node # Mark unschedulable but don't evict kubectl uncordon my-node # Return to schedulable # Remove node from cluster kubectl drain my-node --ignore-daemonsets kubectl delete node my-node
# View all contexts (clusters you can connect to) kubectl config get-contexts # Switch context kubectl config use-context my-prod-cluster kubectl config use-context gke_project_zone_cluster # View current context kubectl config current-context # Set default namespace for context kubectl config set-context --current --namespace=production # Rename a context kubectl config rename-context old-name new-name # Delete a context kubectl config delete-context old-cluster
Before deleting: (1) confirm namespace with kubectl config view --minify. (2) Use --dry-run=client first. (3) Never force-delete pods in a database StatefulSet — can cause data corruption. (4) delete namespace removes PVCs which may delete cloud disks permanently.
RBAC — Role-Based Access Control
# Role — permissions within a namespace apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: pod-reader namespace: production rules: - apiGroups: [""] # "" = core group resources: ["pods", "pods/log"] verbs: ["get", "list", "watch"] - apiGroups: ["apps"] resources: ["deployments"] verbs: ["get", "list"] --- # ClusterRole — cluster-wide permissions apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: node-reader rules: - apiGroups: [""] resources: ["nodes"] verbs: ["get", "list", "watch"] --- # RoleBinding — bind Role to user/group/serviceaccount apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: read-pods-binding namespace: production roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: pod-reader subjects: - kind: User name: alice@company.com # GCP IAM user (on GKE) apiGroup: rbac.authorization.k8s.io - kind: ServiceAccount name: ci-service-account namespace: production
Troubleshooting Guide
# Pending — pod not scheduled to a node # → Insufficient CPU/memory on nodes # → No nodes match nodeSelector/affinity # → PVC not bound kubectl describe pod <pod> | grep -A 15 Events # CrashLoopBackOff — container starting and crashing repeatedly # → Application error on startup # → Bad config / missing env var # → Memory limit too low (OOMKilled) kubectl logs <pod> --previous # Logs from previous crash kubectl describe pod <pod> | grep "Last State" # ImagePullBackOff / ErrImagePull # → Wrong image name/tag # → Private registry — imagePullSecret not configured kubectl describe pod <pod> | grep -A 5 "Failed to pull" # OOMKilled — container exceeded memory limit kubectl describe pod <pod> | grep -A 3 "OOM" # Fix: increase memory limit, or investigate memory leak # Terminating (stuck) # → Finalizers preventing deletion kubectl patch pod <pod> -p '{"metadata":{"finalizers":[]}}' --type=merge kubectl delete pod <pod> --force --grace-period=0 # Service not routing traffic # → Selector doesn't match pod labels kubectl get endpoints <service-name> # Should show pod IPs — if empty, selector is wrong kubectl get pods -l app=my-app # Check labels match service selector # DNS resolution failing inside pod kubectl exec -it <pod> -- nslookup api-service kubectl exec -it <pod> -- curl http://api-service.production.svc.cluster.local/health
| Pod Status | Meaning | First Action |
|---|---|---|
Pending | Waiting to be scheduled | kubectl describe pod → check Events |
ContainerCreating | Image being pulled or volume being mounted | Wait, then check Events |
Running | Container executing normally | Check readinessProbe if not receiving traffic |
CrashLoopBackOff | Container crashing on start | kubectl logs --previous |
OOMKilled | Exceeded memory limit | Raise memory limit or fix leak |
ImagePullBackOff | Can't pull container image | Check image name, registry credentials |
Evicted | Removed due to resource pressure | Check node disk/memory usage |
Terminating | Being deleted gracefully | Wait; if stuck, check finalizers |
Quick Cheat Sheet
Essential kubectl Aliases
# kubectl shorthand alias k='kubectl' alias kgp='kubectl get pods' alias kgpa='kubectl get pods -A' alias kgs='kubectl get services' alias kgd='kubectl get deployments' alias kgn='kubectl get nodes' alias kdp='kubectl describe pod' alias kdd='kubectl describe deployment' alias kl='kubectl logs' alias klf='kubectl logs -f' alias kaf='kubectl apply -f' alias kdf='kubectl delete -f' alias kns='kubectl config set-context --current --namespace' alias kctx='kubectl config use-context' alias ktc='kubectl top pods' # Switch namespace quickly function kn() { kubectl config set-context --current --namespace=$1 } # Watch pods alias kwatch='kubectl get pods -w'
Complete Quick-Reference Table
| Action | Command |
|---|---|
| List all pods | kubectl get pods -A |
| Watch pods | kubectl get pods -w -n production |
| Describe pod | kubectl describe pod <name> |
| Stream logs | kubectl logs -f <pod> |
| Shell into pod | kubectl exec -it <pod> -- /bin/sh |
| Apply manifests | kubectl apply -f k8s/ |
| Update image | kubectl set image deployment/<name> <container>=<image>:tag |
| Scale | kubectl scale deployment <name> --replicas=5 |
| Rollback | kubectl rollout undo deployment/<name> |
| Rollout status | kubectl rollout status deployment/<name> |
| Port forward | kubectl port-forward svc/<name> 8080:80 |
| Resource usage | kubectl top pods -n production |
| Get events | kubectl get events -n production --sort-by=.metadata.creationTimestamp |
| Delete pod | kubectl delete pod <name> |
| Force delete | kubectl delete pod <name> --force --grace-period=0 |
| Delete by label | kubectl delete pods -l app=api |
| Drain node | kubectl drain <node> --ignore-daemonsets |
| Switch namespace | kubectl config set-context --current --namespace=<ns> |
| Switch cluster | kubectl config use-context <context> |
| Diff before apply | kubectl diff -f deployment.yaml |