Back to handbooks index
v1.30 · 2025
Core Concepts Deploy & Scale Ingress/Egress kubectl CLI vs Docker
Container Orchestration Platform

KUBERNETES
HANDBOOK

// K8s v1.30 · kubectl · YAML Manifests · GKE · Production Ready

A complete reference covering Kubernetes architecture, workloads, networking, ingress/egress, horizontal and vertical scaling, storage, and every essential kubectl command from deployment to cleanup. Built for engineers coming from Docker.

Pods & Deployments HPA / VPA Scaling Ingress / NetworkPolicy kubectl Mastery vs Docker RBAC & Security
Foundations

What is Kubernetes?

Kubernetes (K8s) is an open-source container orchestration platform originally developed by Google, now maintained by the CNCF. It automates deploying, scaling, and managing containerized applications across a cluster of machines. Where Docker runs a single container on a single host, Kubernetes runs thousands of containers across hundreds of nodes — reliably, with self-healing, autoscaling, and zero-downtime updates.

K8s
Abbreviation
2014
Open-Sourced
CNCF
Foundation
v1.30
Current Stable
Go
Written In
Self
Heals
⚙️
Self-Healing
Automatically restarts failed containers, replaces and reschedules pods, kills pods that don't respond to health checks.
📈
Auto-Scaling
Scale pods horizontally based on CPU/memory/custom metrics. Scale nodes up/down based on workload demand.
🔄
Zero-Downtime Deploys
Rolling updates replace old pods with new ones gradually. Rollback instantly if something goes wrong.
🌐
Service Discovery
Built-in DNS and load balancing. Services find each other by name — no hardcoded IPs needed.
🗄️
Storage Orchestration
Automatically mounts storage — local disk, cloud volumes (GCE PD, AWS EBS, Azure Disk), NFS, etc.
🔐
Secret Management
Secrets and ConfigMaps are stored in etcd, injected into pods at runtime — no hardcoded credentials in images.
Foundations

Kubernetes vs Docker

Docker and Kubernetes are complementary, not competing. Docker builds and runs containers on a single host. Kubernetes orchestrates those containers across many hosts. Think of Docker as the engine and Kubernetes as the fleet management system.

DimensionDocker (standalone)Kubernetes
ScopeSingle hostMulti-node cluster
Unit of workContainerPod (1+ containers)
ScalingManual (docker run again)Automatic (HPA, KEDA)
Self-healing❌ Manual restart✅ Automatic via ReplicaSet
Load balancingManual (nginx config)Built-in via Service
Rolling updatesManual scriptingBuilt-in, configurable
Service discoveryManual DNS / linksBuilt-in DNS (CoreDNS)
Config managementEnvironment vars / bind mountsConfigMaps + Secrets
NetworkingDocker networks (single host)CNI plugins (cluster-wide)
Multi-tenant✅ Namespaces + RBAC
ComplexityLow — simple to startHigh — many concepts
Use whenDev, single-server, simple appsProduction, microservices, scale
🐳 Docker Compose (single host)
# docker-compose.yml
services:
  api:
    image: myapp:latest
    ports: ["8080:8080"]
    environment:
      DB_HOST: db
  db:
    image: postgres:15
    volumes: ["data:/var/lib/postgresql"]

# docker compose up -d
# No healing, no real scaling,
# single machine only
☸️ Kubernetes (multi-node cluster)
# deployment.yaml
kind: Deployment
spec:
  replicas: 3          # 3 pods, any node
  selector:
    matchLabels:
      app: api
  template:
    spec:
      containers:
      - image: myapp:latest
# Self-heals, scales, deploys
# across multiple machines
📌 When NOT to use Kubernetes

K8s has real operational overhead. For a simple single-service app, a VPS with Docker Compose or a managed platform (Render, Railway, Fly.io) is often better. Use Kubernetes when you have multiple services that need independent scaling, high availability requirements, or a team that can manage the infrastructure.

Foundations

Cluster Architecture

A Kubernetes cluster has two types of machines: the Control Plane (brain — manages state) and Worker Nodes (muscle — run workloads). In managed services like GKE, EKS, and AKS, the control plane is managed for you.

┌─────────────────────────────────────────────────────────────────────┐
│                    CONTROL PLANE (Master Node)                       │
│                                                                     │
│  ┌─────────────────┐  ┌──────────────┐  ┌────────────────────────┐ │
│  │  API Server      │  │  Scheduler   │  │  Controller Manager   │ │
│  │  (kube-apiserver)│  │  (kube-sched)│  │  ReplicaSet, Deploy..  │ │
│  └─────────────────┘  └──────────────┘  └────────────────────────┘ │
│  ┌─────────────────┐  ┌──────────────────────────────────────────┐  │
│  │  etcd            │  │  Cloud Controller Manager (GKE/EKS/AKS)   │  │
│  │  (key-val store) │  │  (provisions LBs, volumes, nodes)        │  │
│  └─────────────────┘  └──────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────┘
           │ kubectl / API calls            │
           ▼                               ▼
┌────────────────────┐       ┌─────────────────────┐
│  WORKER NODE 1      │       │  WORKER NODE 2       │
│  ┌──────────────┐  │       │  ┌──────────────┐   │
│  │  kubelet     │  │       │  │  kubelet     │   │
│  │  (node agent)│  │       │  │  (node agent)│   │
│  └──────────────┘  │       │  └──────────────┘   │
│  ┌──────────────┐  │       │  ┌──────────────┐   │
│  │  kube-proxy  │  │       │  │  kube-proxy  │   │
│  │  (networking)│  │       │  │  (networking)│   │
│  └──────────────┘  │       │  └──────────────┘   │
│  ┌──────────────┐  │       │  ┌──────────────┐   │
│  │  Container   │  │       │  │  Container   │   │
│  │  Runtime     │  │       │  │  Runtime     │   │
│  │  (containerd)│  │       │  │  (containerd)│   │
│  └──────────────┘  │       │  └──────────────┘   │
│  [ POD ][ POD ]    │       │  [ POD ][ POD ]     │
└────────────────────┘       └─────────────────────┘

Control Plane Components

ComponentRole
kube-apiserverSingle entry point for all cluster operations. Every kubectl command hits this. Validates and processes requests.
etcdDistributed key-value store. The single source of truth — stores all cluster state, config, secrets.
kube-schedulerWatches for unscheduled pods and assigns them to nodes based on resources, affinity, taints/tolerations.
kube-controller-managerRuns controller loops — ReplicaSet controller, Deployment controller, Job controller, etc.
cloud-controller-managerIntegrates with cloud APIs — creates load balancers, persistent disks, and node pools on GKE/EKS/AKS.
Foundations

Core Objects

Everything in Kubernetes is an object — a persistent entity with a desired state. You declare what you want in YAML, kubectl sends it to the API server, and Kubernetes controllers reconcile actual state to match desired state.

Pod
v1 / core
Smallest deployable unit. One or more containers sharing network + storage. Usually managed by Deployment, not created directly.
Deployment
apps/v1
Manages a desired number of identical pod replicas. Handles rolling updates and rollbacks. The main way to run stateless apps.
ReplicaSet
apps/v1
Ensures N pod replicas always run. Created and managed by Deployment — rarely used directly.
Service
v1 / core
Stable network endpoint for a set of pods. Provides load balancing and DNS discovery. Types: ClusterIP, NodePort, LoadBalancer.
Ingress
networking.k8s.io/v1
HTTP/HTTPS routing from external traffic to Services. Rules based on host/path. Requires an Ingress Controller (nginx, Traefik).
ConfigMap
v1 / core
Non-sensitive configuration data as key-value pairs. Injected into pods as env vars or mounted as files.
Secret
v1 / core
Like ConfigMap but for sensitive data (passwords, tokens). Base64-encoded in etcd. Use external secrets manager in production.
Namespace
v1 / core
Virtual cluster within a cluster. Isolates resources by team/env. Default, kube-system, and kube-public are built-in.
PersistentVolume
v1 / core
A piece of provisioned storage in the cluster. Lifecycle independent from pods. Backed by cloud disks, NFS, etc.
StatefulSet
apps/v1
Like Deployment but for stateful apps. Pods get stable network identities (pod-0, pod-1) and persistent storage. For databases.
DaemonSet
apps/v1
Runs exactly one pod on every node (or selected nodes). Used for log collectors, monitoring agents, CNI plugins.
HPA
autoscaling/v2
Horizontal Pod Autoscaler. Automatically scales Deployment/ReplicaSet replicas based on metrics.

Every K8s Object Has These Fields

yamlAnatomy of a Kubernetes Manifest
apiVersion: apps/v1          # API group + version (apps/v1, v1, networking.k8s.io/v1)
kind: Deployment              # Object type
metadata:
  name: my-app                # Unique name within namespace
  namespace: production       # Logical grouping
  labels:                     # Key-value tags — used for selection
    app: my-app
    version: v2
    team: backend
  annotations:               # Non-identifying metadata (for tools/humans)
    deployment.kubernetes.io/revision: "3"
    description: "Main API service"
spec:                         # DESIRED STATE — what you want
  ...
status:                       # ACTUAL STATE — managed by K8s (read-only)
  ...
Workloads

Pods in Depth

A Pod is the smallest deployable unit. It wraps one or more containers that share a network namespace (same IP, same localhost) and optional shared storage volumes. Containers in a pod are scheduled together on the same node.

yamlpod.yaml — single and multi-container
# Single container pod (basic)
apiVersion: v1
kind: Pod
metadata:
  name: my-api-pod
  labels:
    app: my-api
spec:
  containers:
  - name: api
    image: myrepo/my-api:v2.1.0
    ports:
    - containerPort: 8080
    env:
    - name: APP_ENV
      value: production
    resources:
      requests:
        cpu: "250m"      # 250 millicores = 0.25 vCPU
        memory: "256Mi"
      limits:
        cpu: "500m"
        memory: "512Mi"

# Multi-container pod — sidecar pattern
spec:
  containers:
  - name: app
    image: myapp:latest
    volumeMounts:
    - name: shared-logs
      mountPath: /var/log/app

  - name: log-shipper            # Sidecar container — same pod
    image: fluentd:v1.16
    volumeMounts:
    - name: shared-logs
      mountPath: /var/log/app

  volumes:
  - name: shared-logs
    emptyDir: {}               # Shared ephemeral volume
⚠️ Don't Create Pods Directly

Bare pods are not restarted if they die or the node fails. Always use a Deployment (stateless apps), StatefulSet (stateful apps), DaemonSet (one per node), or Job (batch) to manage pods. The controller ensures the desired number of pods is always running.

Workloads

Deployments Complete Reference

yamldeployment.yaml — production-ready with all key fields
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-deployment
  namespace: production
  labels:
    app: api
    version: v2
spec:
  replicas: 3                         # Desired pod count
  selector:
    matchLabels:
      app: api                         # Must match template labels
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1                      # Max extra pods during update
      maxUnavailable: 0               # Zero downtime — never kill before ready
  template:
    metadata:
      labels:
        app: api
        version: v2
    spec:
      terminationGracePeriodSeconds: 30   # Time for graceful shutdown
      containers:
      - name: api
        image: gcr.io/my-project/api:v2.1.0
        imagePullPolicy: Always
        ports:
        - containerPort: 8080
          protocol: TCP
        envFrom:
        - configMapRef:
            name: api-config            # All ConfigMap keys as env vars
        - secretRef:
            name: api-secrets           # All Secret keys as env vars
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name  # Downward API
        resources:
          requests:
            cpu: "250m"
            memory: "256Mi"
          limits:
            cpu: "1000m"
            memory: "512Mi"
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 15
          periodSeconds: 10
          failureThreshold: 3
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
      affinity:
        podAntiAffinity:               # Spread pods across nodes
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  app: api
              topologyKey: kubernetes.io/hostname
Workloads

ReplicaSets Explained

A ReplicaSet ensures that a specified number of pod replicas are running at any time. If a pod dies, the ReplicaSet creates a replacement. If there are too many pods, it deletes the excess. Deployments manage ReplicaSets for you — providing versioning and rollback on top.

Deployment
manages
ReplicaSet v2
manages
Pod-1
+
Pod-2
+
Pod-3
ReplicaSet v1
Pod-A (old)
being terminated

During a rolling update, Deployment creates a new ReplicaSet (v2) while scaling down the old one (v1). Old ReplicaSets are kept for rollback history (controlled by revisionHistoryLimit).

Workloads

StatefulSets & DaemonSets

StatefulSet — Ordered, Persistent Identity
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
spec:
  serviceName: postgres-headless
  replicas: 3
  selector:
    matchLabels:
      app: postgres
  template:
    spec:
      containers:
      - name: postgres
        image: postgres:16
  volumeClaimTemplates:  # Each pod gets its own PVC
  - metadata:
      name: data
    spec:
      accessModes: [ReadWriteOnce]
      resources:
        requests:
          storage: 10Gi
# Pods: postgres-0, postgres-1, postgres-2
# Each gets stable DNS: postgres-0.postgres-headless
DaemonSet — One Pod per Node
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: log-collector
spec:
  selector:
    matchLabels:
      app: log-collector
  template:
    spec:
      containers:
      - name: fluentd
        image: fluentd:v1.16
        volumeMounts:
        - name: varlog
          mountPath: /var/log
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
# Runs on EVERY worker node
# Also used for: monitoring agents (Prometheus
# node-exporter), CNI plugins, security agents
Workloads

Jobs & CronJobs

yamlJob + CronJob
# Job — run to completion once
apiVersion: batch/v1
kind: Job
metadata:
  name: db-migration
spec:
  backoffLimit: 3         # Retry failed pods up to 3 times
  ttlSecondsAfterFinished: 3600  # Auto-delete 1hr after completion
  template:
    spec:
      restartPolicy: OnFailure
      containers:
      - name: migrate
        image: myapp:latest
        command: ["dotnet", "ef", "database", "update"]

# CronJob — scheduled jobs
apiVersion: batch/v1
kind: CronJob
metadata:
  name: nightly-report
spec:
  schedule: "0 2 * * *"    # 2 AM every night (cron syntax)
  concurrencyPolicy: Forbid  # Don't start if previous run still going
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: OnFailure
          containers:
          - name: reporter
            image: myapp:latest
            command: ["python", "generate_report.py"]
Configuration

Namespaces

Namespaces provide virtual clusters within a physical cluster. They isolate resources by environment, team, or application. Resource names must be unique within a namespace but can repeat across namespaces.

bashNamespace Commands
# Create namespace
kubectl create namespace production
kubectl create namespace staging

# YAML approach
kubectl apply -f - <<EOF
apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    env: production
    team: backend
EOF

# Work within a namespace
kubectl get pods -n production
kubectl get all -n staging

# Set default namespace for current context
kubectl config set-context --current --namespace=production

# List all namespaces
kubectl get namespaces

# Get resources across ALL namespaces
kubectl get pods -A
kubectl get deployments --all-namespaces
Built-in NamespacePurpose
defaultResources with no namespace specified end up here. Avoid using for production workloads.
kube-systemKubernetes system components: CoreDNS, kube-proxy, metrics-server, CNI plugins.
kube-publicPublicly readable. Contains cluster-info ConfigMap. Rarely used by applications.
kube-node-leaseNode heartbeat objects. Used internally for node health tracking.
Configuration

ConfigMaps & Usage Patterns

yamlconfigmap.yaml + all injection methods
# Define ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
  name: api-config
data:
  APP_ENV: production
  LOG_LEVEL: info
  MAX_CONNECTIONS: "100"
  app.conf: |            # Multi-line file content
    server.port=8080
    server.host=0.0.0.0
    db.pool.size=20

---
# Inject as individual env var
env:
- name: APP_ENV
  valueFrom:
    configMapKeyRef:
      name: api-config
      key: APP_ENV

# Inject ALL keys as env vars
envFrom:
- configMapRef:
    name: api-config

# Mount as files in container
volumeMounts:
- name: config-vol
  mountPath: /app/config
volumes:
- name: config-vol
  configMap:
    name: api-config
    items:
    - key: app.conf
      path: app.conf    # Mounts as /app/config/app.conf
Configuration

Secrets

bashCreating Secrets
# From literal values (kubectl encodes to base64 automatically)
kubectl create secret generic db-credentials \
  --from-literal=username=admin \
  --from-literal=password='S3cr3t!Pass'

# From file (e.g., TLS certificate)
kubectl create secret tls tls-secret \
  --cert=cert.pem --key=key.pem

# From env file
kubectl create secret generic app-secrets \
  --from-env-file=.env.prod
yamlSecret manifest (values are base64-encoded)
apiVersion: v1
kind: Secret
metadata:
  name: db-credentials
type: Opaque
data:
  username: YWRtaW4=      # base64("admin")
  password: UzNjcjN0IVBhc3M=  # base64("S3cr3t!Pass")
# stringData: allows plain text (K8s encodes it)
stringData:
  api_key: my-plain-text-api-key
🔒 Production Secret Management

Native K8s Secrets are only base64-encoded, not encrypted by default in etcd. For production: enable etcd encryption at rest, and use external secrets managers: External Secrets Operator (syncs from AWS Secrets Manager / GCP Secret Manager / HashiCorp Vault), Sealed Secrets (encrypted in Git), or CSI Secret Store.

Configuration

Resource Requests & Limits

Requests = what the scheduler guarantees the pod. Used to decide which node to place the pod on. Limits = the max the pod can consume. CPU is throttled at the limit; exceeding memory limit kills the container (OOMKilled).

SettingCPU BehaviorMemory Behavior
requestsGuaranteed minimum, used for schedulingGuaranteed minimum, used for scheduling
limitsThrottled (not killed) if exceededProcess killed (OOMKilled) if exceeded
no limits setCan use all node CPU (noisy neighbor)Can OOMKill other pods on node
yamlResource units reference
resources:
  requests:
    cpu: "250m"        # 250 millicores = 0.25 vCPU
    memory: "256Mi"    # 256 Mebibytes
  limits:
    cpu: "1"           # 1 full vCPU (= 1000m)
    memory: "1Gi"     # 1 Gibibyte

# CPU units:
#   1     = 1 vCPU / 1 core / 1 AWS vCPU / 1 hyperthread
#   0.5   = 500m (half a core)
#   100m  = 0.1 core (1/10 of a core)

# Memory units:
#   Ki = Kibibyte (1024 bytes)
#   Mi = Mebibyte (1024 Ki)
#   Gi = Gibibyte (1024 Mi)
#   K, M, G = decimal units (avoid — use Ki/Mi/Gi)

# LimitRange — default limits for a namespace
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: production
spec:
  limits:
  - type: Container
    default:
      cpu: "500m"
      memory: "256Mi"
    defaultRequest:
      cpu: "100m"
      memory: "128Mi"
Networking

Services — Network Abstraction

A Service is a stable network endpoint for a set of pods. Because pod IPs change constantly (pods die and restart), you never address pods directly. A Service has a fixed ClusterIP and DNS name that routes to healthy pods via kube-proxy.

TypeReachable FromUse Case
ClusterIPWithin cluster onlyInternal microservice communication. Default type.
NodePortOutside via node IP:portDev/testing. Exposes port 30000-32767 on every node.
LoadBalancerPublic internetProduction. Creates a cloud LB (AWS NLB, GCP Network LB). One IP per service — expensive.
ExternalNameWithin clusterAlias for external DNS (e.g., point to RDS hostname).
HeadlessDirect pod IPsStatefulSet stable pod DNS. Set clusterIP: None.
yamlservice.yaml — ClusterIP + LoadBalancer
# ClusterIP — internal only (default)
apiVersion: v1
kind: Service
metadata:
  name: api-service
  namespace: production
spec:
  type: ClusterIP
  selector:
    app: api                  # Routes to pods with this label
  ports:
  - name: http
    protocol: TCP
    port: 80                  # Service port (what clients call)
    targetPort: 8080          # Pod port (what container listens on)

---
# LoadBalancer — public internet access
apiVersion: v1
kind: Service
metadata:
  name: api-lb
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: nlb
spec:
  type: LoadBalancer
  selector:
    app: api
  ports:
  - port: 443
    targetPort: 8080

---
# Headless — for StatefulSet pod-level DNS
apiVersion: v1
kind: Service
metadata:
  name: postgres-headless
spec:
  clusterIP: None             # No VIP — direct pod IPs
  selector:
    app: postgres
Networking

Ingress — HTTP Routing

Ingress is the K8s-native way to route HTTP/HTTPS traffic from outside the cluster to Services inside. It acts as a Layer 7 reverse proxy, routing by hostname and path. It requires an Ingress Controller to be installed in the cluster (nginx-ingress, Traefik, HAProxy, GKE's built-in, AWS ALB Controller).

  Internet
     │
     ▼
┌─────────────────────────────────────────────────────┐
│           LoadBalancer Service                      │
│    (cloud LB — single external IP/DNS)              │
└─────────────────────────────────────────────────────┘
     │
     ▼
┌─────────────────────────────────────────────────────┐
│           Ingress Controller Pod                   │
│     (nginx / traefik / alb — reads Ingress objects) │
│                                                     │
│  Rule: api.myapp.com/v1     → api-service:80        │
│  Rule: api.myapp.com/v2     → api-v2-service:80     │
│  Rule: admin.myapp.com      → admin-service:80      │
│  Rule: *                    → frontend-service:80   │
└─────────────────────────────────────────────────────┘
     │              │                │
     ▼              ▼                ▼
 api-service    api-v2-service  admin-service
  (ClusterIP)   (ClusterIP)     (ClusterIP)
     │
     ▼
  [Pod][Pod][Pod]
yamlingress.yaml — host + path based routing + TLS
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: main-ingress
  namespace: production
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/rewrite-target: /
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/rate-limit: "100"
    cert-manager.io/cluster-issuer: letsencrypt-prod  # Auto TLS
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - api.myapp.com
    - admin.myapp.com
    secretName: myapp-tls      # cert-manager populates this
  rules:
  - host: api.myapp.com
    http:
      paths:
      - path: /v1
        pathType: Prefix
        backend:
          service:
            name: api-v1-service
            port:
              number: 80
      - path: /v2
        pathType: Prefix
        backend:
          service:
            name: api-v2-service
            port:
              number: 80
  - host: admin.myapp.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: admin-service
            port:
              number: 80

Install nginx Ingress Controller

bash
# Helm (recommended)
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm install ingress-nginx ingress-nginx/ingress-nginx \
  --namespace ingress-nginx --create-namespace

# Or kubectl apply
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.10.1/deploy/static/provider/cloud/deploy.yaml

# Verify
kubectl get pods -n ingress-nginx
kubectl get svc -n ingress-nginx    # Shows external IP
Networking

Egress & NetworkPolicy

NetworkPolicy is K8s's firewall for pod-to-pod and pod-to-external traffic. By default, all pods can talk to all other pods. A NetworkPolicy restricts this using label selectors. Requires a CNI plugin that supports NetworkPolicy (Calico, Cilium, Weave — AWS VPC CNI + Calico on EKS, Dataplane V2 on GKE).

📌 Ingress vs Egress in NetworkPolicy (different from K8s Ingress object)

In NetworkPolicy, ingress = traffic coming into a pod, and egress = traffic going out of a pod. This is different from the Ingress object which routes external HTTP traffic. Be careful not to confuse these two uses of the word.

yamlnetworkpolicy.yaml — deny-all + allow specific
# Step 1: Deny all ingress + egress in a namespace (default deny)
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}          # Applies to ALL pods in namespace
  policyTypes:
  - Ingress
  - Egress

---
# Step 2: Allow api-service to receive from frontend only
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend-to-api
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: api                  # Policy applies to api pods
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend           # Only allow from frontend pods
    - namespaceSelector:
        matchLabels:
          name: production        # In production namespace only
    ports:
    - protocol: TCP
      port: 8080

---
# Step 3: Allow api pods EGRESS to database + external DNS
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: api-egress-rules
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: api
  policyTypes:
  - Egress
  egress:
  - to:                         # Allow to database pods
    - podSelector:
        matchLabels:
          app: postgres
    ports:
    - port: 5432
  - to:                         # Allow DNS (kube-dns)
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: kube-system
    ports:
    - port: 53
      protocol: UDP
    - port: 53
      protocol: TCP
  - to:                         # Allow external HTTPS (payment API)
    - ipBlock:
        cidr: 0.0.0.0/0
        except:
        - 10.0.0.0/8              # Block internal RFC1918
        - 172.16.0.0/12
        - 192.168.0.0/16
    ports:
    - port: 443
Networking

DNS & Service Discovery

Kubernetes runs CoreDNS as the cluster DNS server. Every Service gets a DNS name automatically. Pods find services by name — no hardcoded IPs.

textDNS naming patterns
# Full DNS name format:
{service-name}.{namespace}.svc.cluster.local

# Examples:
api-service.production.svc.cluster.local  → api service in production namespace
postgres.production.svc.cluster.local     → postgres service
api-service.default.svc.cluster.local     → api in default namespace

# Shorthand within SAME namespace:
api-service                   → resolved to api-service.production.svc.cluster.local
postgres:5432                 → database connection string

# StatefulSet pod DNS (headless service required):
postgres-0.postgres-headless.production.svc.cluster.local
postgres-1.postgres-headless.production.svc.cluster.local
postgres-2.postgres-headless.production.svc.cluster.local
Scaling

Horizontal Pod Autoscaler (HPA)

HPA automatically increases or decreases the number of pod replicas based on observed metrics (CPU, memory, or custom metrics via metrics-server / KEDA). It scales out (more pods) and in (fewer pods) — not the pod size itself.

                   HPA Controller Loop (every 15s)
                           │
    ┌──────────────────────┼──────────────────────┐
    │                      ▼                      │
    │          metrics-server / Prometheus         │
    │          (CPU: 78% across 3 pods)            │
    │                      │                      │
    │                      ▼                      │
    │    desiredReplicas = ceil(3 × 78/50) = 5    │
    │                      │                      │
    │                      ▼                      │
    │          Scale Deployment to 5 replicas      │
    └──────────────────────────────────────────────┘

Scale In: CPU drops → scale down (5min cooldown by default)
Scale Out: CPU rises → scale up immediately
yamlhpa.yaml — CPU + Memory + Custom Metrics
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-deployment     # Which deployment to scale
  minReplicas: 2             # Never go below 2 (HA guarantee)
  maxReplicas: 20            # Never exceed 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50  # Scale when CPU avg > 50%
  - type: Resource
    resource:
      name: memory
      target:
        type: AverageValue
        averageValue: 400Mi     # Scale when avg mem > 400 Mi
  - type: Pods                # Custom metric from Prometheus adapter
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "100"     # 100 RPS per pod
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300  # Wait 5min before scale-down
      policies:
      - type: Pods
        value: 2
        periodSeconds: 60      # Max 2 pods removed per minute
    scaleUp:
      stabilizationWindowSeconds: 0    # Scale up immediately
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15      # Double pods every 15 seconds
bashHPA kubectl commands
# Create HPA imperatively (quick)
kubectl autoscale deployment api-deployment \
  --cpu-percent=50 --min=2 --max=20

# Check HPA status
kubectl get hpa -n production
kubectl describe hpa api-hpa -n production

# Watch HPA in action
kubectl get hpa api-hpa -n production -w

# Install metrics-server (required for CPU/memory HPA)
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# Check current pod resource usage
kubectl top pods -n production
kubectl top nodes
Scaling

Vertical Pod Autoscaler (VPA)

VPA automatically adjusts the CPU and memory requests and limits for individual pods based on actual usage. Instead of adding more pods (HPA), it makes each pod bigger or smaller. This is vertical scaling — giving a pod more (or less) resources.

HPA — Horizontal (more pods)
Before: 2 pods × 256Mi each
  [Pod] [Pod]

After (high load):
  [Pod] [Pod] [Pod] [Pod]
4 pods × 256Mi = 1 GB total

✅ Zero downtime scale-out
✅ Fast response to load spikes
✅ Stateless apps
❌ Each pod stays same size
VPA — Vertical (bigger pods)
Before: 1 pod × 256Mi
  [Pod 256Mi]

After VPA recommendation:
  [Pod 768Mi]
1 pod × 768Mi (right-sized)

⚠️ Pod restart required (eviction)
✅ Stateful apps / batch jobs
✅ Right-sizing (save costs)
❌ Don't use HPA+VPA on same metric
yamlvpa.yaml — VPA in Auto and Recommendation mode
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-deployment
  updatePolicy:
    updateMode: "Auto"         # Off | Initial | Recreate | Auto
    # Off        = only recommendations, no changes
    # Initial    = set only on new pods
    # Recreate   = evict and recreate pods to apply
    # Auto       = evict and recreate (default for production)
  resourcePolicy:
    containerPolicies:
    - containerName: api
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 2000m
        memory: 4Gi
      controlledResources: ["cpu", "memory"]
bashCheck VPA recommendations
# Install VPA (from GitHub)
kubectl apply -f https://github.com/kubernetes/autoscaler/raw/master/vertical-pod-autoscaler/deploy/vpa-v1-crd-gen.yaml

# View VPA recommendations
kubectl describe vpa api-vpa -n production
# Look for:
# Recommendation:
#   Container Recommendations:
#     Container Name: api
#     Lower Bound:    cpu: 50m, memory: 128Mi
#     Target:         cpu: 240m, memory: 400Mi   ← apply this
#     Upper Bound:    cpu: 800m, memory: 1Gi

# Get VPA objects
kubectl get vpa -n production
Scaling

Cluster Autoscaler

The Cluster Autoscaler (CA) adds or removes nodes from the cluster based on pending pods and node utilization. It works at the node pool level. HPA/VPA scale pods; CA scales nodes.

ScalerWhat it scalesTrigger
HPAPod replicas (count)CPU / memory / custom metrics
VPAPod resources (size)Historical resource usage
Cluster AutoscalerNode countUnschedulable pods / underutilized nodes
KEDAPod replicas (to 0)Events (queue depth, HTTP requests, cron)
bashEnable Cluster Autoscaler on GKE
# Enable when creating cluster
gcloud container clusters create my-cluster \
  --enable-autoscaling \
  --min-nodes=1 --max-nodes=10 \
  --zone=us-central1-a

# Enable on existing node pool
gcloud container clusters update my-cluster \
  --enable-autoscaling \
  --min-nodes=2 --max-nodes=20 \
  --node-pool=default-pool

# Add annotation to keep pod on specific node during scale-down
kubectl annotate pod my-pod \
  cluster-autoscaler.kubernetes.io/safe-to-evict=false
Storage

Volumes & PersistentVolumeClaims

yamlPVC + PV + Pod using storage
# PersistentVolumeClaim — request storage
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-pvc
  namespace: production
spec:
  accessModes:
  - ReadWriteOnce            # RWO: single node R/W
                              # ROX: many nodes read-only
                              # RWX: many nodes R/W (NFS)
  storageClassName: standard-rwo
  resources:
    requests:
      storage: 20Gi

---
# Use PVC in a Deployment
spec:
  containers:
  - name: postgres
    image: postgres:16
    volumeMounts:
    - name: data
      mountPath: /var/lib/postgresql/data
  volumes:
  - name: data
    persistentVolumeClaim:
      claimName: postgres-pvc   # Reference the PVC above
Storage

Storage Classes

CloudStorageClassTypeUse
GKEstandard-rwoBalanced PDGeneral purpose
GKEpremium-rwoSSD PDHigh IOPS (databases)
EKSgp3EBS gp3General purpose
EKSio1EBS io1High IOPS provisioned
AKSmanagedAzure DiskGeneral purpose
AnyCustomNFS / Ceph / LonghornRWX shared volumes
Deploy & Operate

Deploy Workflow

1
Build & Push Container Image
Build your Docker image and push to a registry (GCR, ECR, Docker Hub, Artifact Registry). Tag with git SHA for reproducibility.
2
Write Kubernetes Manifests
Create Deployment, Service, ConfigMap, Secret YAML files. Store in k8s/ directory in your repo. Use Kustomize or Helm for environment-specific values.
3
Apply Manifests
kubectl apply -f k8s/ — declarative apply. Kubernetes computes the diff and applies only changes.
4
Verify Rollout
kubectl rollout status deployment/api-deployment — watches until all pods are updated and ready.
5
Monitor
Watch pod events, check logs, verify readiness probes pass. Use kubectl get pods -w to watch status in real time.
bashFull deployment commands
# Apply single file
kubectl apply -f deployment.yaml

# Apply all files in directory
kubectl apply -f k8s/

# Apply with recursive subdirectories
kubectl apply -f k8s/ --recursive

# Dry run — see what would change
kubectl apply -f deployment.yaml --dry-run=client
kubectl apply -f deployment.yaml --dry-run=server  # More accurate

# Diff — show changes before applying
kubectl diff -f deployment.yaml

# Update image (triggers rolling update)
kubectl set image deployment/api-deployment \
  api=gcr.io/my-project/api:v2.2.0

# Scale replicas manually
kubectl scale deployment api-deployment --replicas=5

# Watch rollout progress
kubectl rollout status deployment/api-deployment -n production -w

# See rollout history
kubectl rollout history deployment/api-deployment
Deploy & Operate

Rolling Updates

A rolling update replaces old pods with new pods incrementally, ensuring the application stays available throughout. Controlled by maxSurge (extra pods allowed during update) and maxUnavailable (pods that can be down).

yamlRolling update configuration
strategy:
  type: RollingUpdate
  rollingUpdate:
    maxSurge: 1          # Allow 1 extra pod (4 running during update on 3 replicas)
    maxUnavailable: 0    # Never have less than 3 ready pods (zero downtime)

# High-availability config (more surge, faster update):
#   maxSurge: 25%
#   maxUnavailable: 25%

# Zero-downtime absolute guarantee:
#   maxSurge: 1
#   maxUnavailable: 0
Initial:   [v1] [v1] [v1]         (3 running)
Step 1:    [v1] [v1] [v1] [v2]   (maxSurge=1, create 1 new)
Step 2:    [v1] [v1] [v2]        (v2 ready, kill 1 old)
Step 3:    [v1] [v1] [v2] [v2]   (create another new)
Step 4:    [v1] [v2] [v2]        (kill another old)
Step 5:    [v1] [v2] [v2] [v2]   (create last new)
Final:     [v2] [v2] [v2]        (kill last old)
Deploy & Operate

Rollbacks

bashRollback commands
# View rollout history (shows revision numbers)
kubectl rollout history deployment/api-deployment -n production
# REVISION  CHANGE-CAUSE
# 1         Initial deploy
# 2         kubectl set image ... api=v2.0
# 3         kubectl set image ... api=v2.1  ← current

# View details of a specific revision
kubectl rollout history deployment/api-deployment --revision=2

# Rollback to immediately previous revision
kubectl rollout undo deployment/api-deployment -n production

# Rollback to specific revision
kubectl rollout undo deployment/api-deployment --to-revision=1

# Pause a rollout (to manually verify)
kubectl rollout pause deployment/api-deployment

# Resume paused rollout
kubectl rollout resume deployment/api-deployment

# Keep revision history (default: 10)
# Set in deployment spec:
#   revisionHistoryLimit: 5

# Annotate deployment (shows in history CHANGE-CAUSE)
kubectl annotate deployment api-deployment \
  kubernetes.io/change-cause="Deploy v2.1.0 - fix memory leak"
Deploy & Operate

Health Probes

Kubernetes uses three types of probes to determine pod health. Getting these right is critical for zero-downtime deployments and self-healing.

ProbeFailure ActionUse For
livenessProbeKills and restarts the containerDetect deadlocks. If app is stuck but not crashed.
readinessProbeRemoves pod from Service endpoint (stops traffic)Is the app ready to receive requests? DB connected?
startupProbeKills container if it doesn't start in timeSlow-starting apps (Java, .NET). Disables liveness until startup.
yamlAll three probe types with all check methods
# HTTP probe (most common — checks /health endpoint)
livenessProbe:
  httpGet:
    path: /health/live
    port: 8080
    httpHeaders:
    - name: X-Health-Check
      value: kubernetes
  initialDelaySeconds: 20    # Wait 20s before first probe
  periodSeconds: 10          # Probe every 10s
  timeoutSeconds: 5          # Fail after 5s no response
  failureThreshold: 3        # Kill after 3 consecutive failures
  successThreshold: 1

readinessProbe:
  httpGet:
    path: /health/ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 5
  failureThreshold: 2        # Remove from LB after 2 failures (fast)

startupProbe:
  httpGet:
    path: /health/live
    port: 8080
  failureThreshold: 30       # 30 × 10s = 5min to start
  periodSeconds: 10

---
# TCP probe (for databases, non-HTTP services)
livenessProbe:
  tcpSocket:
    port: 5432
  initialDelaySeconds: 30
  periodSeconds: 10

# Exec probe (run command inside container)
livenessProbe:
  exec:
    command:
    - /bin/sh
    - -c
    - "redis-cli ping | grep PONG"
  initialDelaySeconds: 5
  periodSeconds: 10

# gRPC probe (.NET 8+ with gRPC health check)
livenessProbe:
  grpc:
    port: 8080
    service: grpc.health.v1.Health
kubectl Reference

Common kubectl Commands

GET — inspect resources
bash
# Get resources — basic
kubectl get pods
kubectl get pods -n production
kubectl get pods -A                          # All namespaces
kubectl get pods -o wide                     # Shows node, IP
kubectl get pods -l app=api                  # Label selector
kubectl get pods --field-selector status.phase=Running

# Get with output formats
kubectl get deployment api-deployment -o yaml    # Full YAML spec
kubectl get deployment api-deployment -o json    # Full JSON
kubectl get pods -o jsonpath='{.items[*].metadata.name}'  # Specific field
kubectl get pods -o custom-columns=NAME:.metadata.name,STATUS:.status.phase

# Get multiple resource types
kubectl get pods,services,deployments -n production

# Watch (live updates)
kubectl get pods -w
kubectl get pods -n production -w
DESCRIBE — detailed info + events
bash
kubectl describe pod api-pod-abc123
kubectl describe deployment api-deployment -n production
kubectl describe node my-node
kubectl describe service api-service
kubectl describe pvc postgres-pvc
LOGS — container output
bash
# Basic logs
kubectl logs pod-name
kubectl logs pod-name -n production
kubectl logs pod-name -c container-name      # Specific container in multi-container pod

# Stream live logs
kubectl logs -f pod-name
kubectl logs -f deployment/api-deployment    # From deployment (picks one pod)

# Previous container instance (after crash)
kubectl logs pod-name --previous

# Tail last N lines
kubectl logs pod-name --tail=100
kubectl logs pod-name --since=1h             # Last 1 hour
kubectl logs pod-name --since-time=2025-01-01T00:00:00Z

# Logs from all pods with label (requires stern or manual loop)
kubectl logs -l app=api --prefix=true --all-containers=true -f
EXEC — run commands in containers
bash
# Open interactive shell
kubectl exec -it pod-name -- /bin/bash
kubectl exec -it pod-name -- /bin/sh         # If bash not available
kubectl exec -it pod-name -c container-name -- /bin/bash

# Run single command
kubectl exec pod-name -- env
kubectl exec pod-name -- cat /etc/config/app.conf
kubectl exec pod-name -- curl http://localhost:8080/health

# Copy files to/from pod
kubectl cp pod-name:/app/logs/error.log ./error.log
kubectl cp ./config.json pod-name:/app/config.json
PORT-FORWARD — local access to cluster resources
bash
# Forward localhost:8080 → pod:8080
kubectl port-forward pod/api-pod-abc123 8080:8080

# Forward to service (recommended — picks a healthy pod)
kubectl port-forward service/api-service 8080:80 -n production

# Forward to deployment
kubectl port-forward deployment/api-deployment 8080:8080

# Multiple ports
kubectl port-forward service/api-service 8080:80 8443:443

# Access Kubernetes Dashboard
kubectl port-forward -n kubernetes-dashboard service/kubernetes-dashboard 8443:443
APPLY, CREATE, EDIT, PATCH
bash
# Declarative (preferred — idempotent)
kubectl apply -f manifest.yaml
kubectl apply -f k8s/ --recursive

# Imperative creates (good for quick testing)
kubectl create deployment nginx --image=nginx:latest --replicas=2
kubectl create service clusterip my-svc --tcp=80:8080
kubectl create configmap app-config --from-literal=ENV=prod --from-file=config.properties
kubectl create secret generic db-creds --from-literal=pw=secret123

# Edit live object in editor
kubectl edit deployment api-deployment -n production

# Patch — surgical update without full YAML
kubectl patch deployment api-deployment \
  --patch '{"spec":{"replicas":5}}'

kubectl patch deployment api-deployment \
  --type=json \
  -p='[{"op":"replace","path":"/spec/template/spec/containers/0/image","value":"myapp:v3"}]'

# Label / Annotate
kubectl label pod my-pod environment=production
kubectl annotate deployment api-deployment version="2.1.0"
kubectl Reference

Debug Commands

bashDebugging tools
# Debug pod with ephemeral container (K8s 1.23+)
kubectl debug -it pod-name --image=busybox:latest --target=app

# Debug with a copy of the pod (preserves original)
kubectl debug pod-name --copy-to=pod-name-debug --image=ubuntu

# Describe events for a pod (most useful for scheduling issues)
kubectl describe pod pod-name | grep -A 20 Events

# Get all events in namespace sorted by time
kubectl get events -n production --sort-by=.metadata.creationTimestamp

# Get only Warning events
kubectl get events -n production --field-selector type=Warning

# Check resource usage
kubectl top pods -n production
kubectl top pods -n production --sort-by=memory
kubectl top nodes

# Check why pod is pending
kubectl describe pod pending-pod | grep -A 10 "Events:"
# Common causes: Insufficient CPU/Memory, No nodes match selector,
#                PVC not bound, Image pull error

# Check node conditions
kubectl describe node my-node | grep -A 10 "Conditions:"

# View API server audit logs
kubectl get events --all-namespaces | grep Warning | head -20

# Validate YAML manifest
kubectl apply -f manifest.yaml --dry-run=server --validate=true
kubectl Reference

Cleanup Commands

bashDelete resources
# Delete by name
kubectl delete pod my-pod
kubectl delete deployment api-deployment -n production
kubectl delete service api-service -n production

# Delete by file (opposite of apply)
kubectl delete -f deployment.yaml
kubectl delete -f k8s/ --recursive

# Delete by label selector
kubectl delete pods -l app=api -n production
kubectl delete all -l app=my-app -n production

# Delete all pods in namespace (they will restart if managed by Deployment)
kubectl delete pods --all -n production

# Delete all resources in namespace (use with caution!)
kubectl delete all --all -n staging
# Note: "all" doesn't include PVCs, Secrets, ConfigMaps

# Delete EVERYTHING in namespace including PVCs, ConfigMaps, Secrets
kubectl delete namespace staging          # ⚠️ DESTRUCTIVE - deletes everything inside

# Force delete stuck pod (last resort — may cause split-brain)
kubectl delete pod stuck-pod --force --grace-period=0

# Delete completed/failed jobs
kubectl delete jobs --field-selector status.successful=1
kubectl delete pods --field-selector status.phase=Succeeded
kubectl delete pods --field-selector status.phase=Failed

# Delete evicted pods
kubectl get pods -A | grep Evicted | awk '{print $2 " -n " $1}' | xargs kubectl delete pod

# Drain a node (for maintenance)
kubectl drain my-node --ignore-daemonsets --delete-emptydir-data
kubectl cordon my-node        # Mark unschedulable but don't evict
kubectl uncordon my-node      # Return to schedulable

# Remove node from cluster
kubectl drain my-node --ignore-daemonsets
kubectl delete node my-node
bashContext & cluster management
# View all contexts (clusters you can connect to)
kubectl config get-contexts

# Switch context
kubectl config use-context my-prod-cluster
kubectl config use-context gke_project_zone_cluster

# View current context
kubectl config current-context

# Set default namespace for context
kubectl config set-context --current --namespace=production

# Rename a context
kubectl config rename-context old-name new-name

# Delete a context
kubectl config delete-context old-cluster
⚠️ Cleanup Safety Checklist

Before deleting: (1) confirm namespace with kubectl config view --minify. (2) Use --dry-run=client first. (3) Never force-delete pods in a database StatefulSet — can cause data corruption. (4) delete namespace removes PVCs which may delete cloud disks permanently.

Reference

RBAC — Role-Based Access Control

yamlRole + RoleBinding + ServiceAccount
# Role — permissions within a namespace
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: pod-reader
  namespace: production
rules:
- apiGroups: [""]               # "" = core group
  resources: ["pods", "pods/log"]
  verbs: ["get", "list", "watch"]
- apiGroups: ["apps"]
  resources: ["deployments"]
  verbs: ["get", "list"]

---
# ClusterRole — cluster-wide permissions
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: node-reader
rules:
- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["get", "list", "watch"]

---
# RoleBinding — bind Role to user/group/serviceaccount
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: read-pods-binding
  namespace: production
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: pod-reader
subjects:
- kind: User
  name: alice@company.com    # GCP IAM user (on GKE)
  apiGroup: rbac.authorization.k8s.io
- kind: ServiceAccount
  name: ci-service-account
  namespace: production
Reference

Troubleshooting Guide

bashPod status decode
# Pending — pod not scheduled to a node
# → Insufficient CPU/memory on nodes
# → No nodes match nodeSelector/affinity
# → PVC not bound
kubectl describe pod <pod> | grep -A 15 Events

# CrashLoopBackOff — container starting and crashing repeatedly
# → Application error on startup
# → Bad config / missing env var
# → Memory limit too low (OOMKilled)
kubectl logs <pod> --previous      # Logs from previous crash
kubectl describe pod <pod> | grep "Last State"

# ImagePullBackOff / ErrImagePull
# → Wrong image name/tag
# → Private registry — imagePullSecret not configured
kubectl describe pod <pod> | grep -A 5 "Failed to pull"

# OOMKilled — container exceeded memory limit
kubectl describe pod <pod> | grep -A 3 "OOM"
# Fix: increase memory limit, or investigate memory leak

# Terminating (stuck)
# → Finalizers preventing deletion
kubectl patch pod <pod> -p '{"metadata":{"finalizers":[]}}' --type=merge
kubectl delete pod <pod> --force --grace-period=0

# Service not routing traffic
# → Selector doesn't match pod labels
kubectl get endpoints <service-name>   # Should show pod IPs — if empty, selector is wrong
kubectl get pods -l app=my-app          # Check labels match service selector

# DNS resolution failing inside pod
kubectl exec -it <pod> -- nslookup api-service
kubectl exec -it <pod> -- curl http://api-service.production.svc.cluster.local/health
Pod StatusMeaningFirst Action
PendingWaiting to be scheduledkubectl describe pod → check Events
ContainerCreatingImage being pulled or volume being mountedWait, then check Events
RunningContainer executing normallyCheck readinessProbe if not receiving traffic
CrashLoopBackOffContainer crashing on startkubectl logs --previous
OOMKilledExceeded memory limitRaise memory limit or fix leak
ImagePullBackOffCan't pull container imageCheck image name, registry credentials
EvictedRemoved due to resource pressureCheck node disk/memory usage
TerminatingBeing deleted gracefullyWait; if stuck, check finalizers
Reference

Quick Cheat Sheet

Essential kubectl Aliases

bash~/.bashrc or ~/.zshrc
# kubectl shorthand
alias k='kubectl'
alias kgp='kubectl get pods'
alias kgpa='kubectl get pods -A'
alias kgs='kubectl get services'
alias kgd='kubectl get deployments'
alias kgn='kubectl get nodes'
alias kdp='kubectl describe pod'
alias kdd='kubectl describe deployment'
alias kl='kubectl logs'
alias klf='kubectl logs -f'
alias kaf='kubectl apply -f'
alias kdf='kubectl delete -f'
alias kns='kubectl config set-context --current --namespace'
alias kctx='kubectl config use-context'
alias ktc='kubectl top pods'

# Switch namespace quickly
function kn() { kubectl config set-context --current --namespace=$1 }

# Watch pods
alias kwatch='kubectl get pods -w'

Complete Quick-Reference Table

ActionCommand
List all podskubectl get pods -A
Watch podskubectl get pods -w -n production
Describe podkubectl describe pod <name>
Stream logskubectl logs -f <pod>
Shell into podkubectl exec -it <pod> -- /bin/sh
Apply manifestskubectl apply -f k8s/
Update imagekubectl set image deployment/<name> <container>=<image>:tag
Scalekubectl scale deployment <name> --replicas=5
Rollbackkubectl rollout undo deployment/<name>
Rollout statuskubectl rollout status deployment/<name>
Port forwardkubectl port-forward svc/<name> 8080:80
Resource usagekubectl top pods -n production
Get eventskubectl get events -n production --sort-by=.metadata.creationTimestamp
Delete podkubectl delete pod <name>
Force deletekubectl delete pod <name> --force --grace-period=0
Delete by labelkubectl delete pods -l app=api
Drain nodekubectl drain <node> --ignore-daemonsets
Switch namespacekubectl config set-context --current --namespace=<ns>
Switch clusterkubectl config use-context <context>
Diff before applykubectl diff -f deployment.yaml

Key Concepts at a Glance

Pod = containers + shared net Deployment = manages ReplicaSets ReplicaSet = ensures N pods Service = stable endpoint Ingress = HTTP router (L7) NetworkPolicy = firewall HPA = more pods (horizontal) VPA = bigger pods (vertical) ConfigMap = non-secret config Secret = sensitive config StatefulSet = ordered, stable IDs DaemonSet = one per node Namespace = virtual cluster PVC = persistent storage claim Job = run to completion CronJob = scheduled job