Container Orchestration Platform

KUBERNETES
HANDBOOK

// K8s v1.30 · kubectl · YAML Manifests · GKE · Production Ready

A complete reference covering Kubernetes architecture, workloads, networking, ingress/egress, horizontal and vertical scaling, storage, and every essential kubectl command from deployment to cleanup. Built for engineers coming from Docker.

Pods & Deployments HPA / VPA Scaling Ingress / NetworkPolicy kubectl Mastery vs Docker RBAC & Security

Foundations

What is Kubernetes?

Kubernetes (K8s) is an open-source container orchestration platform originally developed by Google, now maintained by the CNCF. It automates deploying, scaling, and managing containerized applications across a cluster of machines. Where Docker runs a single container on a single host, Kubernetes runs thousands of containers across hundreds of nodes — reliably, with self-healing, autoscaling, and zero-downtime updates.

K8s

Abbreviation

2014

Open-Sourced

CNCF

Foundation

v1.30

Current Stable

Written In

Self

Heals

⚙️

Self-Healing

Automatically restarts failed containers, replaces and reschedules pods, kills pods that don't respond to health checks.

📈

Auto-Scaling

Scale pods horizontally based on CPU/memory/custom metrics. Scale nodes up/down based on workload demand.

🔄

Zero-Downtime Deploys

Rolling updates replace old pods with new ones gradually. Rollback instantly if something goes wrong.

🌐

Service Discovery

Built-in DNS and load balancing. Services find each other by name — no hardcoded IPs needed.

🗄️

Storage Orchestration

Automatically mounts storage — local disk, cloud volumes (GCE PD, AWS EBS, Azure Disk), NFS, etc.

🔐

Secret Management

Secrets and ConfigMaps are stored in etcd, injected into pods at runtime — no hardcoded credentials in images.

Foundations

Kubernetes vs Docker

Docker and Kubernetes are complementary, not competing. Docker builds and runs containers on a single host. Kubernetes orchestrates those containers across many hosts. Think of Docker as the engine and Kubernetes as the fleet management system.

Dimension	Docker (standalone)	Kubernetes
Scope	Single host	Multi-node cluster
Unit of work	Container	Pod (1+ containers)
Scaling	Manual (`docker run` again)	Automatic (HPA, KEDA)
Self-healing	❌ Manual restart	✅ Automatic via ReplicaSet
Load balancing	Manual (nginx config)	Built-in via Service
Rolling updates	Manual scripting	Built-in, configurable
Service discovery	Manual DNS / links	Built-in DNS (CoreDNS)
Config management	Environment vars / bind mounts	ConfigMaps + Secrets
Networking	Docker networks (single host)	CNI plugins (cluster-wide)
Multi-tenant	❌	✅ Namespaces + RBAC
Complexity	Low — simple to start	High — many concepts
Use when	Dev, single-server, simple apps	Production, microservices, scale

🐳 Docker Compose (single host)

# docker-compose.yml
services:
  api:
    image: myapp:latest
    ports: ["8080:8080"]
    environment:
      DB_HOST: db
  db:
    image: postgres:15
    volumes: ["data:/var/lib/postgresql"]

# docker compose up -d
# No healing, no real scaling,
# single machine only

☸️ Kubernetes (multi-node cluster)

# deployment.yaml
kind: Deployment
spec:
  replicas: 3          # 3 pods, any node
  selector:
    matchLabels:
      app: api
  template:
    spec:
      containers:
      - image: myapp:latest
# Self-heals, scales, deploys
# across multiple machines

📌 When NOT to use Kubernetes

K8s has real operational overhead. For a simple single-service app, a VPS with Docker Compose or a managed platform (Render, Railway, Fly.io) is often better. Use Kubernetes when you have multiple services that need independent scaling, high availability requirements, or a team that can manage the infrastructure.

Foundations

Cluster Architecture

A Kubernetes cluster has two types of machines: the Control Plane (brain — manages state) and Worker Nodes (muscle — run workloads). In managed services like GKE, EKS, and AKS, the control plane is managed for you.

┌─────────────────────────────────────────────────────────────────────┐
│                    CONTROL PLANE (Master Node)                       │
│                                                                     │
│  ┌─────────────────┐  ┌──────────────┐  ┌────────────────────────┐ │
│  │  API Server      │  │  Scheduler   │  │  Controller Manager   │ │
│  │  (kube-apiserver)│  │  (kube-sched)│  │  ReplicaSet, Deploy..  │ │
│  └─────────────────┘  └──────────────┘  └────────────────────────┘ │
│  ┌─────────────────┐  ┌──────────────────────────────────────────┐  │
│  │  etcd            │  │  Cloud Controller Manager (GKE/EKS/AKS)   │  │
│  │  (key-val store) │  │  (provisions LBs, volumes, nodes)        │  │
│  └─────────────────┘  └──────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────┘
           │ kubectl / API calls            │
           ▼                               ▼
┌────────────────────┐       ┌─────────────────────┐
│  WORKER NODE 1      │       │  WORKER NODE 2       │
│  ┌──────────────┐  │       │  ┌──────────────┐   │
│  │  kubelet     │  │       │  │  kubelet     │   │
│  │  (node agent)│  │       │  │  (node agent)│   │
│  └──────────────┘  │       │  └──────────────┘   │
│  ┌──────────────┐  │       │  ┌──────────────┐   │
│  │  kube-proxy  │  │       │  │  kube-proxy  │   │
│  │  (networking)│  │       │  │  (networking)│   │
│  └──────────────┘  │       │  └──────────────┘   │
│  ┌──────────────┐  │       │  ┌──────────────┐   │
│  │  Container   │  │       │  │  Container   │   │
│  │  Runtime     │  │       │  │  Runtime     │   │
│  │  (containerd)│  │       │  │  (containerd)│   │
│  └──────────────┘  │       │  └──────────────┘   │
│  [ POD ][ POD ]    │       │  [ POD ][ POD ]     │
└────────────────────┘       └─────────────────────┘

Control Plane Components

Component	Role
kube-apiserver	Single entry point for all cluster operations. Every kubectl command hits this. Validates and processes requests.
etcd	Distributed key-value store. The single source of truth — stores all cluster state, config, secrets.
kube-scheduler	Watches for unscheduled pods and assigns them to nodes based on resources, affinity, taints/tolerations.
kube-controller-manager	Runs controller loops — ReplicaSet controller, Deployment controller, Job controller, etc.
cloud-controller-manager	Integrates with cloud APIs — creates load balancers, persistent disks, and node pools on GKE/EKS/AKS.

Foundations

Core Objects

Everything in Kubernetes is an object — a persistent entity with a desired state. You declare what you want in YAML, kubectl sends it to the API server, and Kubernetes controllers reconcile actual state to match desired state.

Pod

v1 / core

Smallest deployable unit. One or more containers sharing network + storage. Usually managed by Deployment, not created directly.

Deployment

apps/v1

Manages a desired number of identical pod replicas. Handles rolling updates and rollbacks. The main way to run stateless apps.

ReplicaSet

apps/v1

Ensures N pod replicas always run. Created and managed by Deployment — rarely used directly.

Service

v1 / core

Stable network endpoint for a set of pods. Provides load balancing and DNS discovery. Types: ClusterIP, NodePort, LoadBalancer.

Ingress

networking.k8s.io/v1

HTTP/HTTPS routing from external traffic to Services. Rules based on host/path. Requires an Ingress Controller (nginx, Traefik).

ConfigMap

v1 / core

Non-sensitive configuration data as key-value pairs. Injected into pods as env vars or mounted as files.

Secret

v1 / core

Like ConfigMap but for sensitive data (passwords, tokens). Base64-encoded in etcd. Use external secrets manager in production.

Namespace

v1 / core

Virtual cluster within a cluster. Isolates resources by team/env. Default, kube-system, and kube-public are built-in.

PersistentVolume

v1 / core

A piece of provisioned storage in the cluster. Lifecycle independent from pods. Backed by cloud disks, NFS, etc.

StatefulSet

apps/v1

Like Deployment but for stateful apps. Pods get stable network identities (pod-0, pod-1) and persistent storage. For databases.

DaemonSet

apps/v1

Runs exactly one pod on every node (or selected nodes). Used for log collectors, monitoring agents, CNI plugins.

HPA

autoscaling/v2

Horizontal Pod Autoscaler. Automatically scales Deployment/ReplicaSet replicas based on metrics.

Every K8s Object Has These Fields

yamlAnatomy of a Kubernetes Manifest

apiVersion: apps/v1          # API group + version (apps/v1, v1, networking.k8s.io/v1)
kind: Deployment              # Object type
metadata:
  name: my-app                # Unique name within namespace
  namespace: production       # Logical grouping
  labels:                     # Key-value tags — used for selection
    app: my-app
    version: v2
    team: backend
  annotations:               # Non-identifying metadata (for tools/humans)
    deployment.kubernetes.io/revision: "3"
    description: "Main API service"
spec:                         # DESIRED STATE — what you want
  ...
status:                       # ACTUAL STATE — managed by K8s (read-only)
  ...

Workloads

Pods in Depth

A Pod is the smallest deployable unit. It wraps one or more containers that share a network namespace (same IP, same localhost) and optional shared storage volumes. Containers in a pod are scheduled together on the same node.

yamlpod.yaml — single and multi-container

# Single container pod (basic)
apiVersion: v1
kind: Pod
metadata:
  name: my-api-pod
  labels:
    app: my-api
spec:
  containers:
  - name: api
    image: myrepo/my-api:v2.1.0
    ports:
    - containerPort: 8080
    env:
    - name: APP_ENV
      value: production
    resources:
      requests:
        cpu: "250m"      # 250 millicores = 0.25 vCPU
        memory: "256Mi"
      limits:
        cpu: "500m"
        memory: "512Mi"

# Multi-container pod — sidecar pattern
spec:
  containers:
  - name: app
    image: myapp:latest
    volumeMounts:
    - name: shared-logs
      mountPath: /var/log/app

  - name: log-shipper            # Sidecar container — same pod
    image: fluentd:v1.16
    volumeMounts:
    - name: shared-logs
      mountPath: /var/log/app

  volumes:
  - name: shared-logs
    emptyDir: {}               # Shared ephemeral volume

⚠️ Don't Create Pods Directly

Bare pods are not restarted if they die or the node fails. Always use a Deployment (stateless apps), StatefulSet (stateful apps), DaemonSet (one per node), or Job (batch) to manage pods. The controller ensures the desired number of pods is always running.

Workloads

Deployments Complete Reference

yamldeployment.yaml — production-ready with all key fields

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-deployment
  namespace: production
  labels:
    app: api
    version: v2
spec:
  replicas: 3                         # Desired pod count
  selector:
    matchLabels:
      app: api                         # Must match template labels
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1                      # Max extra pods during update
      maxUnavailable: 0               # Zero downtime — never kill before ready
  template:
    metadata:
      labels:
        app: api
        version: v2
    spec:
      terminationGracePeriodSeconds: 30   # Time for graceful shutdown
      containers:
      - name: api
        image: gcr.io/my-project/api:v2.1.0
        imagePullPolicy: Always
        ports:
        - containerPort: 8080
          protocol: TCP
        envFrom:
        - configMapRef:
            name: api-config            # All ConfigMap keys as env vars
        - secretRef:
            name: api-secrets           # All Secret keys as env vars
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name  # Downward API
        resources:
          requests:
            cpu: "250m"
            memory: "256Mi"
          limits:
            cpu: "1000m"
            memory: "512Mi"
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 15
          periodSeconds: 10
          failureThreshold: 3
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
      affinity:
        podAntiAffinity:               # Spread pods across nodes
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  app: api
              topologyKey: kubernetes.io/hostname

Workloads

ReplicaSets Explained

A ReplicaSet ensures that a specified number of pod replicas are running at any time. If a pod dies, the ReplicaSet creates a replacement. If there are too many pods, it deletes the excess. Deployments manage ReplicaSets for you — providing versioning and rollback on top.

Deployment

manages

ReplicaSet v2

manages

Pod-1

Pod-2

Pod-3

ReplicaSet v1

→

Pod-A (old)

→

being terminated

During a rolling update, Deployment creates a new ReplicaSet (v2) while scaling down the old one (v1). Old ReplicaSets are kept for rollback history (controlled by revisionHistoryLimit).

Workloads

StatefulSets & DaemonSets

StatefulSet — Ordered, Persistent Identity

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
spec:
  serviceName: postgres-headless
  replicas: 3
  selector:
    matchLabels:
      app: postgres
  template:
    spec:
      containers:
      - name: postgres
        image: postgres:16
  volumeClaimTemplates:  # Each pod gets its own PVC
  - metadata:
      name: data
    spec:
      accessModes: [ReadWriteOnce]
      resources:
        requests:
          storage: 10Gi
# Pods: postgres-0, postgres-1, postgres-2
# Each gets stable DNS: postgres-0.postgres-headless

DaemonSet — One Pod per Node

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: log-collector
spec:
  selector:
    matchLabels:
      app: log-collector
  template:
    spec:
      containers:
      - name: fluentd
        image: fluentd:v1.16
        volumeMounts:
        - name: varlog
          mountPath: /var/log
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
# Runs on EVERY worker node
# Also used for: monitoring agents (Prometheus
# node-exporter), CNI plugins, security agents

Workloads

Jobs & CronJobs

yamlJob + CronJob

# Job — run to completion once
apiVersion: batch/v1
kind: Job
metadata:
  name: db-migration
spec:
  backoffLimit: 3         # Retry failed pods up to 3 times
  ttlSecondsAfterFinished: 3600  # Auto-delete 1hr after completion
  template:
    spec:
      restartPolicy: OnFailure
      containers:
      - name: migrate
        image: myapp:latest
        command: ["dotnet", "ef", "database", "update"]

# CronJob — scheduled jobs
apiVersion: batch/v1
kind: CronJob
metadata:
  name: nightly-report
spec:
  schedule: "0 2 * * *"    # 2 AM every night (cron syntax)
  concurrencyPolicy: Forbid  # Don't start if previous run still going
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: OnFailure
          containers:
          - name: reporter
            image: myapp:latest
            command: ["python", "generate_report.py"]

Configuration

Namespaces

Namespaces provide virtual clusters within a physical cluster. They isolate resources by environment, team, or application. Resource names must be unique within a namespace but can repeat across namespaces.

bashNamespace Commands

# Create namespace
kubectl create namespace production
kubectl create namespace staging

# YAML approach
kubectl apply -f - <<EOF
apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    env: production
    team: backend
EOF

# Work within a namespace
kubectl get pods -n production
kubectl get all -n staging

# Set default namespace for current context
kubectl config set-context --current --namespace=production

# List all namespaces
kubectl get namespaces

# Get resources across ALL namespaces
kubectl get pods -A
kubectl get deployments --all-namespaces

Built-in Namespace	Purpose
`default`	Resources with no namespace specified end up here. Avoid using for production workloads.
`kube-system`	Kubernetes system components: CoreDNS, kube-proxy, metrics-server, CNI plugins.
`kube-public`	Publicly readable. Contains cluster-info ConfigMap. Rarely used by applications.
`kube-node-lease`	Node heartbeat objects. Used internally for node health tracking.

Configuration

ConfigMaps & Usage Patterns

yamlconfigmap.yaml + all injection methods

# Define ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
  name: api-config
data:
  APP_ENV: production
  LOG_LEVEL: info
  MAX_CONNECTIONS: "100"
  app.conf: |            # Multi-line file content
    server.port=8080
    server.host=0.0.0.0
    db.pool.size=20

---
# Inject as individual env var
env:
- name: APP_ENV
  valueFrom:
    configMapKeyRef:
      name: api-config
      key: APP_ENV

# Inject ALL keys as env vars
envFrom:
- configMapRef:
    name: api-config

# Mount as files in container
volumeMounts:
- name: config-vol
  mountPath: /app/config
volumes:
- name: config-vol
  configMap:
    name: api-config
    items:
    - key: app.conf
      path: app.conf    # Mounts as /app/config/app.conf

Configuration

Secrets

bashCreating Secrets

# From literal values (kubectl encodes to base64 automatically)
kubectl create secret generic db-credentials \
  --from-literal=username=admin \
  --from-literal=password='S3cr3t!Pass'

# From file (e.g., TLS certificate)
kubectl create secret tls tls-secret \
  --cert=cert.pem --key=key.pem

# From env file
kubectl create secret generic app-secrets \
  --from-env-file=.env.prod

yamlSecret manifest (values are base64-encoded)

apiVersion: v1
kind: Secret
metadata:
  name: db-credentials
type: Opaque
data:
  username: YWRtaW4=      # base64("admin")
  password: UzNjcjN0IVBhc3M=  # base64("S3cr3t!Pass")
# stringData: allows plain text (K8s encodes it)
stringData:
  api_key: my-plain-text-api-key

🔒 Production Secret Management

Native K8s Secrets are only base64-encoded, not encrypted by default in etcd. For production: enable etcd encryption at rest, and use external secrets managers: External Secrets Operator (syncs from AWS Secrets Manager / GCP Secret Manager / HashiCorp Vault), Sealed Secrets (encrypted in Git), or CSI Secret Store.

Configuration

Resource Requests & Limits

Requests = what the scheduler guarantees the pod. Used to decide which node to place the pod on. Limits = the max the pod can consume. CPU is throttled at the limit; exceeding memory limit kills the container (OOMKilled).

Setting	CPU Behavior	Memory Behavior
requests	Guaranteed minimum, used for scheduling	Guaranteed minimum, used for scheduling
limits	Throttled (not killed) if exceeded	Process killed (OOMKilled) if exceeded
no limits set	Can use all node CPU (noisy neighbor)	Can OOMKill other pods on node

yamlResource units reference

resources:
  requests:
    cpu: "250m"        # 250 millicores = 0.25 vCPU
    memory: "256Mi"    # 256 Mebibytes
  limits:
    cpu: "1"           # 1 full vCPU (= 1000m)
    memory: "1Gi"     # 1 Gibibyte

# CPU units:
#   1     = 1 vCPU / 1 core / 1 AWS vCPU / 1 hyperthread
#   0.5   = 500m (half a core)
#   100m  = 0.1 core (1/10 of a core)

# Memory units:
#   Ki = Kibibyte (1024 bytes)
#   Mi = Mebibyte (1024 Ki)
#   Gi = Gibibyte (1024 Mi)
#   K, M, G = decimal units (avoid — use Ki/Mi/Gi)

# LimitRange — default limits for a namespace
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: production
spec:
  limits:
  - type: Container
    default:
      cpu: "500m"
      memory: "256Mi"
    defaultRequest:
      cpu: "100m"
      memory: "128Mi"

Networking

Services — Network Abstraction

A Service is a stable network endpoint for a set of pods. Because pod IPs change constantly (pods die and restart), you never address pods directly. A Service has a fixed ClusterIP and DNS name that routes to healthy pods via kube-proxy.

Type	Reachable From	Use Case
ClusterIP	Within cluster only	Internal microservice communication. Default type.
NodePort	Outside via node IP:port	Dev/testing. Exposes port 30000-32767 on every node.
LoadBalancer	Public internet	Production. Creates a cloud LB (AWS NLB, GCP Network LB). One IP per service — expensive.
ExternalName	Within cluster	Alias for external DNS (e.g., point to RDS hostname).
Headless	Direct pod IPs	StatefulSet stable pod DNS. Set `clusterIP: None`.

yamlservice.yaml — ClusterIP + LoadBalancer

# ClusterIP — internal only (default)
apiVersion: v1
kind: Service
metadata:
  name: api-service
  namespace: production
spec:
  type: ClusterIP
  selector:
    app: api                  # Routes to pods with this label
  ports:
  - name: http
    protocol: TCP
    port: 80                  # Service port (what clients call)
    targetPort: 8080          # Pod port (what container listens on)

---
# LoadBalancer — public internet access
apiVersion: v1
kind: Service
metadata:
  name: api-lb
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: nlb
spec:
  type: LoadBalancer
  selector:
    app: api
  ports:
  - port: 443
    targetPort: 8080

---
# Headless — for StatefulSet pod-level DNS
apiVersion: v1
kind: Service
metadata:
  name: postgres-headless
spec:
  clusterIP: None             # No VIP — direct pod IPs
  selector:
    app: postgres

Networking

Ingress — HTTP Routing

Ingress is the K8s-native way to route HTTP/HTTPS traffic from outside the cluster to Services inside. It acts as a Layer 7 reverse proxy, routing by hostname and path. It requires an Ingress Controller to be installed in the cluster (nginx-ingress, Traefik, HAProxy, GKE's built-in, AWS ALB Controller).

  Internet
     │
     ▼
┌─────────────────────────────────────────────────────┐
│           LoadBalancer Service                      │
│    (cloud LB — single external IP/DNS)              │
└─────────────────────────────────────────────────────┘
     │
     ▼
┌─────────────────────────────────────────────────────┐
│           Ingress Controller Pod                   │
│     (nginx / traefik / alb — reads Ingress objects) │
│                                                     │
│  Rule: api.myapp.com/v1     → api-service:80        │
│  Rule: api.myapp.com/v2     → api-v2-service:80     │
│  Rule: admin.myapp.com      → admin-service:80      │
│  Rule: *                    → frontend-service:80   │
└─────────────────────────────────────────────────────┘
     │              │                │
     ▼              ▼                ▼
 api-service    api-v2-service  admin-service
  (ClusterIP)   (ClusterIP)     (ClusterIP)
     │
     ▼
  [Pod][Pod][Pod]

yamlingress.yaml — host + path based routing + TLS

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: main-ingress
  namespace: production
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/rewrite-target: /
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/rate-limit: "100"
    cert-manager.io/cluster-issuer: letsencrypt-prod  # Auto TLS
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - api.myapp.com
    - admin.myapp.com
    secretName: myapp-tls      # cert-manager populates this
  rules:
  - host: api.myapp.com
    http:
      paths:
      - path: /v1
        pathType: Prefix
        backend:
          service:
            name: api-v1-service
            port:
              number: 80
      - path: /v2
        pathType: Prefix
        backend:
          service:
            name: api-v2-service
            port:
              number: 80
  - host: admin.myapp.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: admin-service
            port:
              number: 80

Install nginx Ingress Controller

bash

# Helm (recommended)
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm install ingress-nginx ingress-nginx/ingress-nginx \
  --namespace ingress-nginx --create-namespace

# Or kubectl apply
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.10.1/deploy/static/provider/cloud/deploy.yaml

# Verify
kubectl get pods -n ingress-nginx
kubectl get svc -n ingress-nginx    # Shows external IP

Networking

Egress & NetworkPolicy

NetworkPolicy is K8s's firewall for pod-to-pod and pod-to-external traffic. By default, all pods can talk to all other pods. A NetworkPolicy restricts this using label selectors. Requires a CNI plugin that supports NetworkPolicy (Calico, Cilium, Weave — AWS VPC CNI + Calico on EKS, Dataplane V2 on GKE).

📌 Ingress vs Egress in NetworkPolicy (different from K8s Ingress object)

In NetworkPolicy, ingress = traffic coming into a pod, and egress = traffic going out of a pod. This is different from the Ingress object which routes external HTTP traffic. Be careful not to confuse these two uses of the word.

yamlnetworkpolicy.yaml — deny-all + allow specific

# Step 1: Deny all ingress + egress in a namespace (default deny)
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}          # Applies to ALL pods in namespace
  policyTypes:
  - Ingress
  - Egress

---
# Step 2: Allow api-service to receive from frontend only
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend-to-api
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: api                  # Policy applies to api pods
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend           # Only allow from frontend pods
    - namespaceSelector:
        matchLabels:
          name: production        # In production namespace only
    ports:
    - protocol: TCP
      port: 8080

---
# Step 3: Allow api pods EGRESS to database + external DNS
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: api-egress-rules
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: api
  policyTypes:
  - Egress
  egress:
  - to:                         # Allow to database pods
    - podSelector:
        matchLabels:
          app: postgres
    ports:
    - port: 5432
  - to:                         # Allow DNS (kube-dns)
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: kube-system
    ports:
    - port: 53
      protocol: UDP
    - port: 53
      protocol: TCP
  - to:                         # Allow external HTTPS (payment API)
    - ipBlock:
        cidr: 0.0.0.0/0
        except:
        - 10.0.0.0/8              # Block internal RFC1918
        - 172.16.0.0/12
        - 192.168.0.0/16
    ports:
    - port: 443

Networking

DNS & Service Discovery

Kubernetes runs CoreDNS as the cluster DNS server. Every Service gets a DNS name automatically. Pods find services by name — no hardcoded IPs.

textDNS naming patterns

# Full DNS name format:
{service-name}.{namespace}.svc.cluster.local

# Examples:
api-service.production.svc.cluster.local  → api service in production namespace
postgres.production.svc.cluster.local     → postgres service
api-service.default.svc.cluster.local     → api in default namespace

# Shorthand within SAME namespace:
api-service                   → resolved to api-service.production.svc.cluster.local
postgres:5432                 → database connection string

# StatefulSet pod DNS (headless service required):
postgres-0.postgres-headless.production.svc.cluster.local
postgres-1.postgres-headless.production.svc.cluster.local
postgres-2.postgres-headless.production.svc.cluster.local

Scaling

Horizontal Pod Autoscaler (HPA)

HPA automatically increases or decreases the number of pod replicas based on observed metrics (CPU, memory, or custom metrics via metrics-server / KEDA). It scales out (more pods) and in (fewer pods) — not the pod size itself.

                   HPA Controller Loop (every 15s)
                           │
    ┌──────────────────────┼──────────────────────┐
    │                      ▼                      │
    │          metrics-server / Prometheus         │
    │          (CPU: 78% across 3 pods)            │
    │                      │                      │
    │                      ▼                      │
    │    desiredReplicas = ceil(3 × 78/50) = 5    │
    │                      │                      │
    │                      ▼                      │
    │          Scale Deployment to 5 replicas      │
    └──────────────────────────────────────────────┘

Scale In: CPU drops → scale down (5min cooldown by default)
Scale Out: CPU rises → scale up immediately

yamlhpa.yaml — CPU + Memory + Custom Metrics

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-deployment     # Which deployment to scale
  minReplicas: 2             # Never go below 2 (HA guarantee)
  maxReplicas: 20            # Never exceed 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50  # Scale when CPU avg > 50%
  - type: Resource
    resource:
      name: memory
      target:
        type: AverageValue
        averageValue: 400Mi     # Scale when avg mem > 400 Mi
  - type: Pods                # Custom metric from Prometheus adapter
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "100"     # 100 RPS per pod
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300  # Wait 5min before scale-down
      policies:
      - type: Pods
        value: 2
        periodSeconds: 60      # Max 2 pods removed per minute
    scaleUp:
      stabilizationWindowSeconds: 0    # Scale up immediately
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15      # Double pods every 15 seconds

bashHPA kubectl commands

# Create HPA imperatively (quick)
kubectl autoscale deployment api-deployment \
  --cpu-percent=50 --min=2 --max=20

# Check HPA status
kubectl get hpa -n production
kubectl describe hpa api-hpa -n production

# Watch HPA in action
kubectl get hpa api-hpa -n production -w

# Install metrics-server (required for CPU/memory HPA)
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# Check current pod resource usage
kubectl top pods -n production
kubectl top nodes

Scaling

Vertical Pod Autoscaler (VPA)

VPA automatically adjusts the CPU and memory requests and limits for individual pods based on actual usage. Instead of adding more pods (HPA), it makes each pod bigger or smaller. This is vertical scaling — giving a pod more (or less) resources.

HPA — Horizontal (more pods)

Before: 2 pods × 256Mi each
  [Pod] [Pod]

After (high load):
  [Pod] [Pod] [Pod] [Pod]
4 pods × 256Mi = 1 GB total

✅ Zero downtime scale-out
✅ Fast response to load spikes
✅ Stateless apps
❌ Each pod stays same size

VPA — Vertical (bigger pods)

Before: 1 pod × 256Mi
  [Pod 256Mi]

After VPA recommendation:
  [Pod 768Mi]
1 pod × 768Mi (right-sized)

⚠️ Pod restart required (eviction)
✅ Stateful apps / batch jobs
✅ Right-sizing (save costs)
❌ Don't use HPA+VPA on same metric

yamlvpa.yaml — VPA in Auto and Recommendation mode

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-deployment
  updatePolicy:
    updateMode: "Auto"         # Off | Initial | Recreate | Auto
    # Off        = only recommendations, no changes
    # Initial    = set only on new pods
    # Recreate   = evict and recreate pods to apply
    # Auto       = evict and recreate (default for production)
  resourcePolicy:
    containerPolicies:
    - containerName: api
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 2000m
        memory: 4Gi
      controlledResources: ["cpu", "memory"]

bashCheck VPA recommendations

# Install VPA (from GitHub)
kubectl apply -f https://github.com/kubernetes/autoscaler/raw/master/vertical-pod-autoscaler/deploy/vpa-v1-crd-gen.yaml

# View VPA recommendations
kubectl describe vpa api-vpa -n production
# Look for:
# Recommendation:
#   Container Recommendations:
#     Container Name: api
#     Lower Bound:    cpu: 50m, memory: 128Mi
#     Target:         cpu: 240m, memory: 400Mi   ← apply this
#     Upper Bound:    cpu: 800m, memory: 1Gi

# Get VPA objects
kubectl get vpa -n production

Scaling

Cluster Autoscaler

The Cluster Autoscaler (CA) adds or removes nodes from the cluster based on pending pods and node utilization. It works at the node pool level. HPA/VPA scale pods; CA scales nodes.

Scaler	What it scales	Trigger
HPA	Pod replicas (count)	CPU / memory / custom metrics
VPA	Pod resources (size)	Historical resource usage
Cluster Autoscaler	Node count	Unschedulable pods / underutilized nodes
KEDA	Pod replicas (to 0)	Events (queue depth, HTTP requests, cron)

bashEnable Cluster Autoscaler on GKE

# Enable when creating cluster
gcloud container clusters create my-cluster \
  --enable-autoscaling \
  --min-nodes=1 --max-nodes=10 \
  --zone=us-central1-a

# Enable on existing node pool
gcloud container clusters update my-cluster \
  --enable-autoscaling \
  --min-nodes=2 --max-nodes=20 \
  --node-pool=default-pool

# Add annotation to keep pod on specific node during scale-down
kubectl annotate pod my-pod \
  cluster-autoscaler.kubernetes.io/safe-to-evict=false

Storage

Volumes & PersistentVolumeClaims

yamlPVC + PV + Pod using storage

# PersistentVolumeClaim — request storage
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-pvc
  namespace: production
spec:
  accessModes:
  - ReadWriteOnce            # RWO: single node R/W
                              # ROX: many nodes read-only
                              # RWX: many nodes R/W (NFS)
  storageClassName: standard-rwo
  resources:
    requests:
      storage: 20Gi

---
# Use PVC in a Deployment
spec:
  containers:
  - name: postgres
    image: postgres:16
    volumeMounts:
    - name: data
      mountPath: /var/lib/postgresql/data
  volumes:
  - name: data
    persistentVolumeClaim:
      claimName: postgres-pvc   # Reference the PVC above

Storage

Storage Classes

Cloud	StorageClass	Type	Use
GKE	`standard-rwo`	Balanced PD	General purpose
GKE	`premium-rwo`	SSD PD	High IOPS (databases)
EKS	`gp3`	EBS gp3	General purpose
EKS	`io1`	EBS io1	High IOPS provisioned
AKS	`managed`	Azure Disk	General purpose
Any	Custom	NFS / Ceph / Longhorn	RWX shared volumes

Deploy & Operate

Deploy Workflow

Build & Push Container Image

Build your Docker image and push to a registry (GCR, ECR, Docker Hub, Artifact Registry). Tag with git SHA for reproducibility.

Write Kubernetes Manifests

Create Deployment, Service, ConfigMap, Secret YAML files. Store in k8s/ directory in your repo. Use Kustomize or Helm for environment-specific values.

Apply Manifests

kubectl apply -f k8s/ — declarative apply. Kubernetes computes the diff and applies only changes.

Verify Rollout

kubectl rollout status deployment/api-deployment — watches until all pods are updated and ready.

Monitor

Watch pod events, check logs, verify readiness probes pass. Use kubectl get pods -w to watch status in real time.

bashFull deployment commands

# Apply single file
kubectl apply -f deployment.yaml

# Apply all files in directory
kubectl apply -f k8s/

# Apply with recursive subdirectories
kubectl apply -f k8s/ --recursive

# Dry run — see what would change
kubectl apply -f deployment.yaml --dry-run=client
kubectl apply -f deployment.yaml --dry-run=server  # More accurate

# Diff — show changes before applying
kubectl diff -f deployment.yaml

# Update image (triggers rolling update)
kubectl set image deployment/api-deployment \
  api=gcr.io/my-project/api:v2.2.0

# Scale replicas manually
kubectl scale deployment api-deployment --replicas=5

# Watch rollout progress
kubectl rollout status deployment/api-deployment -n production -w

# See rollout history
kubectl rollout history deployment/api-deployment

Deploy & Operate

Rolling Updates

A rolling update replaces old pods with new pods incrementally, ensuring the application stays available throughout. Controlled by maxSurge (extra pods allowed during update) and maxUnavailable (pods that can be down).

yamlRolling update configuration

strategy:
  type: RollingUpdate
  rollingUpdate:
    maxSurge: 1          # Allow 1 extra pod (4 running during update on 3 replicas)
    maxUnavailable: 0    # Never have less than 3 ready pods (zero downtime)

# High-availability config (more surge, faster update):
#   maxSurge: 25%
#   maxUnavailable: 25%

# Zero-downtime absolute guarantee:
#   maxSurge: 1
#   maxUnavailable: 0

Initial:   [v1] [v1] [v1]         (3 running)
Step 1:    [v1] [v1] [v1] [v2]   (maxSurge=1, create 1 new)
Step 2:    [v1] [v1] [v2]        (v2 ready, kill 1 old)
Step 3:    [v1] [v1] [v2] [v2]   (create another new)
Step 4:    [v1] [v2] [v2]        (kill another old)
Step 5:    [v1] [v2] [v2] [v2]   (create last new)
Final:     [v2] [v2] [v2]        (kill last old)

Deploy & Operate

Rollbacks

bashRollback commands

# View rollout history (shows revision numbers)
kubectl rollout history deployment/api-deployment -n production
# REVISION  CHANGE-CAUSE
# 1         Initial deploy
# 2         kubectl set image ... api=v2.0
# 3         kubectl set image ... api=v2.1  ← current

# View details of a specific revision
kubectl rollout history deployment/api-deployment --revision=2

# Rollback to immediately previous revision
kubectl rollout undo deployment/api-deployment -n production

# Rollback to specific revision
kubectl rollout undo deployment/api-deployment --to-revision=1

# Pause a rollout (to manually verify)
kubectl rollout pause deployment/api-deployment

# Resume paused rollout
kubectl rollout resume deployment/api-deployment

# Keep revision history (default: 10)
# Set in deployment spec:
#   revisionHistoryLimit: 5

# Annotate deployment (shows in history CHANGE-CAUSE)
kubectl annotate deployment api-deployment \
  kubernetes.io/change-cause="Deploy v2.1.0 - fix memory leak"

Deploy & Operate

Health Probes

Kubernetes uses three types of probes to determine pod health. Getting these right is critical for zero-downtime deployments and self-healing.

Probe	Failure Action	Use For
livenessProbe	Kills and restarts the container	Detect deadlocks. If app is stuck but not crashed.
readinessProbe	Removes pod from Service endpoint (stops traffic)	Is the app ready to receive requests? DB connected?
startupProbe	Kills container if it doesn't start in time	Slow-starting apps (Java, .NET). Disables liveness until startup.

yamlAll three probe types with all check methods

# HTTP probe (most common — checks /health endpoint)
livenessProbe:
  httpGet:
    path: /health/live
    port: 8080
    httpHeaders:
    - name: X-Health-Check
      value: kubernetes
  initialDelaySeconds: 20    # Wait 20s before first probe
  periodSeconds: 10          # Probe every 10s
  timeoutSeconds: 5          # Fail after 5s no response
  failureThreshold: 3        # Kill after 3 consecutive failures
  successThreshold: 1

readinessProbe:
  httpGet:
    path: /health/ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 5
  failureThreshold: 2        # Remove from LB after 2 failures (fast)

startupProbe:
  httpGet:
    path: /health/live
    port: 8080
  failureThreshold: 30       # 30 × 10s = 5min to start
  periodSeconds: 10

---
# TCP probe (for databases, non-HTTP services)
livenessProbe:
  tcpSocket:
    port: 5432
  initialDelaySeconds: 30
  periodSeconds: 10

# Exec probe (run command inside container)
livenessProbe:
  exec:
    command:
    - /bin/sh
    - -c
    - "redis-cli ping | grep PONG"
  initialDelaySeconds: 5
  periodSeconds: 10

# gRPC probe (.NET 8+ with gRPC health check)
livenessProbe:
  grpc:
    port: 8080
    service: grpc.health.v1.Health

kubectl Reference

Common kubectl Commands

GET — inspect resources

bash

# Get resources — basic
kubectl get pods
kubectl get pods -n production
kubectl get pods -A                          # All namespaces
kubectl get pods -o wide                     # Shows node, IP
kubectl get pods -l app=api                  # Label selector
kubectl get pods --field-selector status.phase=Running

# Get with output formats
kubectl get deployment api-deployment -o yaml    # Full YAML spec
kubectl get deployment api-deployment -o json    # Full JSON
kubectl get pods -o jsonpath='{.items[*].metadata.name}'  # Specific field
kubectl get pods -o custom-columns=NAME:.metadata.name,STATUS:.status.phase

# Get multiple resource types
kubectl get pods,services,deployments -n production

# Watch (live updates)
kubectl get pods -w
kubectl get pods -n production -w

DESCRIBE — detailed info + events

bash

kubectl describe pod api-pod-abc123
kubectl describe deployment api-deployment -n production
kubectl describe node my-node
kubectl describe service api-service
kubectl describe pvc postgres-pvc

LOGS — container output

bash

# Basic logs
kubectl logs pod-name
kubectl logs pod-name -n production
kubectl logs pod-name -c container-name      # Specific container in multi-container pod

# Stream live logs
kubectl logs -f pod-name
kubectl logs -f deployment/api-deployment    # From deployment (picks one pod)

# Previous container instance (after crash)
kubectl logs pod-name --previous

# Tail last N lines
kubectl logs pod-name --tail=100
kubectl logs pod-name --since=1h             # Last 1 hour
kubectl logs pod-name --since-time=2025-01-01T00:00:00Z

# Logs from all pods with label (requires stern or manual loop)
kubectl logs -l app=api --prefix=true --all-containers=true -f

EXEC — run commands in containers

bash

# Open interactive shell
kubectl exec -it pod-name -- /bin/bash
kubectl exec -it pod-name -- /bin/sh         # If bash not available
kubectl exec -it pod-name -c container-name -- /bin/bash

# Run single command
kubectl exec pod-name -- env
kubectl exec pod-name -- cat /etc/config/app.conf
kubectl exec pod-name -- curl http://localhost:8080/health

# Copy files to/from pod
kubectl cp pod-name:/app/logs/error.log ./error.log
kubectl cp ./config.json pod-name:/app/config.json

PORT-FORWARD — local access to cluster resources

bash

# Forward localhost:8080 → pod:8080
kubectl port-forward pod/api-pod-abc123 8080:8080

# Forward to service (recommended — picks a healthy pod)
kubectl port-forward service/api-service 8080:80 -n production

# Forward to deployment
kubectl port-forward deployment/api-deployment 8080:8080

# Multiple ports
kubectl port-forward service/api-service 8080:80 8443:443

# Access Kubernetes Dashboard
kubectl port-forward -n kubernetes-dashboard service/kubernetes-dashboard 8443:443

APPLY, CREATE, EDIT, PATCH

bash

# Declarative (preferred — idempotent)
kubectl apply -f manifest.yaml
kubectl apply -f k8s/ --recursive

# Imperative creates (good for quick testing)
kubectl create deployment nginx --image=nginx:latest --replicas=2
kubectl create service clusterip my-svc --tcp=80:8080
kubectl create configmap app-config --from-literal=ENV=prod --from-file=config.properties
kubectl create secret generic db-creds --from-literal=pw=secret123

# Edit live object in editor
kubectl edit deployment api-deployment -n production

# Patch — surgical update without full YAML
kubectl patch deployment api-deployment \
  --patch '{"spec":{"replicas":5}}'

kubectl patch deployment api-deployment \
  --type=json \
  -p='[{"op":"replace","path":"/spec/template/spec/containers/0/image","value":"myapp:v3"}]'

# Label / Annotate
kubectl label pod my-pod environment=production
kubectl annotate deployment api-deployment version="2.1.0"

kubectl Reference

Debug Commands

bashDebugging tools

# Debug pod with ephemeral container (K8s 1.23+)
kubectl debug -it pod-name --image=busybox:latest --target=app

# Debug with a copy of the pod (preserves original)
kubectl debug pod-name --copy-to=pod-name-debug --image=ubuntu

# Describe events for a pod (most useful for scheduling issues)
kubectl describe pod pod-name | grep -A 20 Events

# Get all events in namespace sorted by time
kubectl get events -n production --sort-by=.metadata.creationTimestamp

# Get only Warning events
kubectl get events -n production --field-selector type=Warning

# Check resource usage
kubectl top pods -n production
kubectl top pods -n production --sort-by=memory
kubectl top nodes

# Check why pod is pending
kubectl describe pod pending-pod | grep -A 10 "Events:"
# Common causes: Insufficient CPU/Memory, No nodes match selector,
#                PVC not bound, Image pull error

# Check node conditions
kubectl describe node my-node | grep -A 10 "Conditions:"

# View API server audit logs
kubectl get events --all-namespaces | grep Warning | head -20

# Validate YAML manifest
kubectl apply -f manifest.yaml --dry-run=server --validate=true

kubectl Reference

Cleanup Commands

bashDelete resources

# Delete by name
kubectl delete pod my-pod
kubectl delete deployment api-deployment -n production
kubectl delete service api-service -n production

# Delete by file (opposite of apply)
kubectl delete -f deployment.yaml
kubectl delete -f k8s/ --recursive

# Delete by label selector
kubectl delete pods -l app=api -n production
kubectl delete all -l app=my-app -n production

# Delete all pods in namespace (they will restart if managed by Deployment)
kubectl delete pods --all -n production

# Delete all resources in namespace (use with caution!)
kubectl delete all --all -n staging
# Note: "all" doesn't include PVCs, Secrets, ConfigMaps

# Delete EVERYTHING in namespace including PVCs, ConfigMaps, Secrets
kubectl delete namespace staging          # ⚠️ DESTRUCTIVE - deletes everything inside

# Force delete stuck pod (last resort — may cause split-brain)
kubectl delete pod stuck-pod --force --grace-period=0

# Delete completed/failed jobs
kubectl delete jobs --field-selector status.successful=1
kubectl delete pods --field-selector status.phase=Succeeded
kubectl delete pods --field-selector status.phase=Failed

# Delete evicted pods
kubectl get pods -A | grep Evicted | awk '{print $2 " -n " $1}' | xargs kubectl delete pod

# Drain a node (for maintenance)
kubectl drain my-node --ignore-daemonsets --delete-emptydir-data
kubectl cordon my-node        # Mark unschedulable but don't evict
kubectl uncordon my-node      # Return to schedulable

# Remove node from cluster
kubectl drain my-node --ignore-daemonsets
kubectl delete node my-node

bashContext & cluster management

# View all contexts (clusters you can connect to)
kubectl config get-contexts

# Switch context
kubectl config use-context my-prod-cluster
kubectl config use-context gke_project_zone_cluster

# View current context
kubectl config current-context

# Set default namespace for context
kubectl config set-context --current --namespace=production

# Rename a context
kubectl config rename-context old-name new-name

# Delete a context
kubectl config delete-context old-cluster

⚠️ Cleanup Safety Checklist

Before deleting: (1) confirm namespace with kubectl config view --minify. (2) Use --dry-run=client first. (3) Never force-delete pods in a database StatefulSet — can cause data corruption. (4) delete namespace removes PVCs which may delete cloud disks permanently.

Reference

RBAC — Role-Based Access Control

yamlRole + RoleBinding + ServiceAccount

# Role — permissions within a namespace
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: pod-reader
  namespace: production
rules:
- apiGroups: [""]               # "" = core group
  resources: ["pods", "pods/log"]
  verbs: ["get", "list", "watch"]
- apiGroups: ["apps"]
  resources: ["deployments"]
  verbs: ["get", "list"]

---
# ClusterRole — cluster-wide permissions
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: node-reader
rules:
- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["get", "list", "watch"]

---
# RoleBinding — bind Role to user/group/serviceaccount
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: read-pods-binding
  namespace: production
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: pod-reader
subjects:
- kind: User
  name: alice@company.com    # GCP IAM user (on GKE)
  apiGroup: rbac.authorization.k8s.io
- kind: ServiceAccount
  name: ci-service-account
  namespace: production

Reference

Troubleshooting Guide

bashPod status decode

# Pending — pod not scheduled to a node
# → Insufficient CPU/memory on nodes
# → No nodes match nodeSelector/affinity
# → PVC not bound
kubectl describe pod <pod> | grep -A 15 Events

# CrashLoopBackOff — container starting and crashing repeatedly
# → Application error on startup
# → Bad config / missing env var
# → Memory limit too low (OOMKilled)
kubectl logs <pod> --previous      # Logs from previous crash
kubectl describe pod <pod> | grep "Last State"

# ImagePullBackOff / ErrImagePull
# → Wrong image name/tag
# → Private registry — imagePullSecret not configured
kubectl describe pod <pod> | grep -A 5 "Failed to pull"

# OOMKilled — container exceeded memory limit
kubectl describe pod <pod> | grep -A 3 "OOM"
# Fix: increase memory limit, or investigate memory leak

# Terminating (stuck)
# → Finalizers preventing deletion
kubectl patch pod <pod> -p '{"metadata":{"finalizers":[]}}' --type=merge
kubectl delete pod <pod> --force --grace-period=0

# Service not routing traffic
# → Selector doesn't match pod labels
kubectl get endpoints <service-name>   # Should show pod IPs — if empty, selector is wrong
kubectl get pods -l app=my-app          # Check labels match service selector

# DNS resolution failing inside pod
kubectl exec -it <pod> -- nslookup api-service
kubectl exec -it <pod> -- curl http://api-service.production.svc.cluster.local/health

Pod Status	Meaning	First Action
`Pending`	Waiting to be scheduled	`kubectl describe pod` → check Events
`ContainerCreating`	Image being pulled or volume being mounted	Wait, then check Events
`Running`	Container executing normally	Check readinessProbe if not receiving traffic
`CrashLoopBackOff`	Container crashing on start	`kubectl logs --previous`
`OOMKilled`	Exceeded memory limit	Raise memory limit or fix leak
`ImagePullBackOff`	Can't pull container image	Check image name, registry credentials
`Evicted`	Removed due to resource pressure	Check node disk/memory usage
`Terminating`	Being deleted gracefully	Wait; if stuck, check finalizers

Reference

Quick Cheat Sheet

Essential kubectl Aliases

bash~/.bashrc or ~/.zshrc

# kubectl shorthand
alias k='kubectl'
alias kgp='kubectl get pods'
alias kgpa='kubectl get pods -A'
alias kgs='kubectl get services'
alias kgd='kubectl get deployments'
alias kgn='kubectl get nodes'
alias kdp='kubectl describe pod'
alias kdd='kubectl describe deployment'
alias kl='kubectl logs'
alias klf='kubectl logs -f'
alias kaf='kubectl apply -f'
alias kdf='kubectl delete -f'
alias kns='kubectl config set-context --current --namespace'
alias kctx='kubectl config use-context'
alias ktc='kubectl top pods'

# Switch namespace quickly
function kn() { kubectl config set-context --current --namespace=$1 }

# Watch pods
alias kwatch='kubectl get pods -w'

Complete Quick-Reference Table

Action	Command
List all pods	`kubectl get pods -A`
Watch pods	`kubectl get pods -w -n production`
Describe pod	`kubectl describe pod <name>`
Stream logs	`kubectl logs -f <pod>`
Shell into pod	`kubectl exec -it <pod> -- /bin/sh`
Apply manifests	`kubectl apply -f k8s/`
Update image	`kubectl set image deployment/<name> <container>=<image>:tag`
Scale	`kubectl scale deployment <name> --replicas=5`
Rollback	`kubectl rollout undo deployment/<name>`
Rollout status	`kubectl rollout status deployment/<name>`
Port forward	`kubectl port-forward svc/<name> 8080:80`
Resource usage	`kubectl top pods -n production`
Get events	`kubectl get events -n production --sort-by=.metadata.creationTimestamp`
Delete pod	`kubectl delete pod <name>`
Force delete	`kubectl delete pod <name> --force --grace-period=0`
Delete by label	`kubectl delete pods -l app=api`
Drain node	`kubectl drain <node> --ignore-daemonsets`
Switch namespace	`kubectl config set-context --current --namespace=<ns>`
Switch cluster	`kubectl config use-context <context>`
Diff before apply	`kubectl diff -f deployment.yaml`

Key Concepts at a Glance

Pod = containers + shared net Deployment = manages ReplicaSets ReplicaSet = ensures N pods Service = stable endpoint Ingress = HTTP router (L7) NetworkPolicy = firewall HPA = more pods (horizontal) VPA = bigger pods (vertical) ConfigMap = non-secret config Secret = sensitive config StatefulSet = ordered, stable IDs DaemonSet = one per node Namespace = virtual cluster PVC = persistent storage claim Job = run to completion CronJob = scheduled job

KUBERNETESHANDBOOK

What is Kubernetes?

Kubernetes vs Docker

Cluster Architecture

Control Plane Components

Core Objects

Every K8s Object Has These Fields

Pods in Depth

Deployments Complete Reference

ReplicaSets Explained

StatefulSets & DaemonSets

Jobs & CronJobs

Namespaces

ConfigMaps & Usage Patterns

Secrets

Resource Requests & Limits

Services — Network Abstraction

Ingress — HTTP Routing

Install nginx Ingress Controller

Egress & NetworkPolicy

DNS & Service Discovery

Horizontal Pod Autoscaler (HPA)

Vertical Pod Autoscaler (VPA)

Cluster Autoscaler

Volumes & PersistentVolumeClaims

Storage Classes

Deploy Workflow

Rolling Updates

Rollbacks

Health Probes

Common kubectl Commands

Debug Commands

Cleanup Commands

RBAC — Role-Based Access Control

Troubleshooting Guide

Quick Cheat Sheet

Essential kubectl Aliases

Complete Quick-Reference Table

Key Concepts at a Glance

KUBERNETES
HANDBOOK