V1
Back to handbooks index
FIELD MANUAL
Cloud Networking
DOC-NET-2024
LIVE
NET
Architecture Field Handbook

Cloud
Networking

// "Every open port is an attack surface. Every route is a decision."

From IP fundamentals and CIDR masks to VPC architecture, Kubernetes traffic flows, and hardened access control. Covers bastion hosts, private subnets, security groups, and the principle of minimum exposure.

VPC & Subnets CIDR / Masks Kubernetes Ingress / Egress Access Hardening Bastion Hosts
01

IP Addresses & Notation

// IPv4, BINARY, PRIVATE RANGES

An IPv4 address is a 32-bit number written as four octets (0–255) separated by dots. Every device on a network needs a unique IP to send and receive traffic. Cloud resources — VMs, containers, load balancers — all get IPs assigned from the address ranges you define.

An IP address in binary

The address 192.168.1.10 is actually four 8-bit numbers. Each bit position doubles in value: 128, 64, 32, 16, 8, 4, 2, 1.

192.168.1.10 — binary breakdown
192 = 11000000
1
1
0
0
0
0
0
0
.
1
0
1
0
1
0
0
0
.
0
0
0
0
0
0
0
1
.
0
0
0
0
1
0
1
0
192

Private vs Public IP Ranges

RFC 1918 defines three blocks of IPv4 addresses reserved for private (non-routable) use. Cloud VPCs are always built from these. Traffic from private IPs cannot reach the internet directly — it must go through a NAT gateway or internet gateway.

RangeCIDRAddressesTypical Use
Class A Private 10.0.0.0/8 16,777,216 Large VPCs, enterprise networks. Most common in cloud.
Class B Private 172.16.0.0/12 1,048,576 Docker default bridge network, mid-size VPCs.
Class C Private 192.168.0.0/16 65,536 Home/office routers, small networks, dev environments.
Loopback 127.0.0.0/8 16,777,216 Local machine only. 127.0.0.1 = "this device".
Link-local 169.254.0.0/16 65,536 Cloud instance metadata endpoint (e.g., AWS 169.254.169.254).
Cloud metadata service: Every cloud VM can reach 169.254.169.254 to fetch its own identity, IAM role credentials, and bootstrap data. This is how EC2 instances get AWS credentials without hardcoding them. It's also a famous SSRF attack target — restrict it in your security groups and block it from containers that don't need it.
02

Subnets & CIDR Masks

// CLASSLESS INTER-DOMAIN ROUTING

CIDR (Classless Inter-Domain Routing) notation expresses an IP range as a base address plus a prefix length — the number of fixed bits in the network portion. The remaining bits define host addresses. 10.0.1.0/24 means the first 24 bits are fixed (the network), leaving 8 bits for hosts (256 addresses, 254 usable).

10.0.1.0/24 — network vs host bits
NETWORK (fixed — 24 bits)
0
0
0
0
1
0
1
0
.
0
0
0
0
0
0
0
0
.
0
0
0
0
0
0
0
1
.
HOST (variable — 8 bits)
*
*
*
*
*
*
*
*
10.0.1.0 = network address (not assignable)
10.0.1.1 – 10.0.1.254 = 254 usable host addresses
10.0.1.255 = broadcast address (not assignable)

CIDR Reference Table

CIDRSubnet MaskTotal IPsUsable HostsTypical Use
/8 255.0.0.0 16,777,21616,777,214Entire VPC allocation (very large)
/16 255.255.0.0 65,536 65,534 Standard VPC size (AWS default)
/20 255.255.240.0 4,096 4,091 Large subnet (AWS default per-AZ)
/24 255.255.255.0 256 251* Standard subnet — most common
/27 255.255.255.22432 27* Small subnet for specific tiers
/28 255.255.255.24016 11* Tiny subnet — NAT GW, bastion
/32 255.255.255.2551 1 Single host — security group rules

* AWS reserves 5 IPs per subnet: network address, VPC router, DNS, future use, broadcast.

How to Read a Subnet Mask

A subnet mask has all 1s in the network portion and all 0s in the host portion. Bitwise AND of any IP with the mask gives the network address. This is how routers know whether a destination is local (same subnet) or remote (needs routing).

Subnet Math — worked example
# Is 10.0.1.42 in the subnet 10.0.1.0/24? IP address: 10.0.1.42 = 00001010.00000000.00000001.00101010 Subnet mask: /24 255.255.255.0 = 11111111.11111111.11111111.00000000 AND operation: 00001010.00000000.00000001.00000000 = 10.0.1.0 ← network address Compare to subnet network: 10.0.1.0 → MATCH — same subnet, local delivery # Is 10.0.2.5 in the same subnet? IP address: 10.0.2.5 = 00001010.00000000.00000010.00000101 AND with mask: = 00001010.00000000.00000010.00000000 = 10.0.2.0 Compare to 10.0.1.0 → NO MATCH — different subnet, must route via gateway # Quick mental math: /24 → last octet is host. /16 → last two octets are host. # /24 gives 256 addresses. /23 gives 512. /22 gives 1024. Each -1 doubles.
Subnet planning rule: Plan for growth. If you need 50 hosts, use /25 (128 IPs), not /26 (64). Once a subnet is created in AWS/GCP/Azure, its CIDR cannot be changed. Avoid overlap with on-premises ranges if you'll need VPN or Direct Connect in the future.
03

How Packets Travel

// L2 vs L3, ARP, ROUTING, NAT

A network packet travels from source to destination through a layered decision process. At each hop, a device examines the destination IP and decides: deliver locally (Layer 2 / ARP) or forward to the next hop (Layer 3 routing). Understanding this is the foundation for cloud VPC design.

Application
Sends to IP:Port
OS Stack
Same subnet?
YES: ARP
Get MAC, deliver direct
|
NO: Route
Forward to gateway
Gateway
Next hop decision
Layer 2 — Same Subnet Local

When source and destination are in the same subnet, the OS uses ARP (Address Resolution Protocol) to discover the destination's MAC address, then sends the frame directly. No router involved. In cloud VPCs, the hypervisor handles this — virtual NICs communicate without leaving the host in many implementations.

Layer 3 — Different Subnet Routed

When subnets differ, the OS forwards the packet to its default gateway (the router's IP on that subnet, e.g. 10.0.1.1). The router reads the destination IP, looks up its routing table, and forwards to the next hop. This repeats until the packet reaches its destination or is dropped.

NAT — Private to Internet Outbound

Network Address Translation allows private IPs to reach the internet. The NAT gateway replaces the packet's private source IP with its own public IP, maintains a translation table, and rewrites the return traffic. The internet sees only the NAT gateway's IP — internal topology is hidden.

DNS — Names to IPs Essential

Before any packet is sent, the OS resolves the hostname to an IP via DNS. In a VPC, there's a built-in resolver at 169.254.169.253 (AWS) or the second IP of your VPC range. Kubernetes has its own internal DNS (CoreDNS) that resolves service names to cluster IPs.

TCP Connection: The Three-Way Handshake

TCP Handshake & Stateful Tracking
Client (10.0.1.5) Gateway / Firewall Server (10.0.2.10:443) ── SYN ──────────────────────────────────────────────────────► seq=1000 [NEW connection tracked] ← SYN-ACK ── [ESTABLISHED in state table] seq=5000, ack=1001 ── ACK ──────────────────────────────────────────────────────► ack=5001 # Connection now ESTABLISHED. Data flows. # Stateful firewalls (Security Groups, iptables with conntrack) track this. # Return traffic (SYN-ACK, data) is AUTO-ALLOWED because the connection state is known. # You do NOT need a separate inbound rule for the return traffic. # STATELESS firewalls (NACLs) do NOT track state. # You MUST explicitly allow both directions: outbound rule + inbound rule for ephemeral ports. # Ephemeral (return) ports: 1024-65535 (client OS chooses randomly from this range)
04

VPC Architecture

// VIRTUAL PRIVATE CLOUD — YOUR ISOLATED NETWORK

A Virtual Private Cloud (VPC) is a logically isolated network you define in the cloud. You choose the CIDR block, divide it into subnets across availability zones, and control all routing and access rules. Think of it as your private data center network, defined entirely in software.

Standard 3-tier VPC layout
┌─────────────────────────────────────────────────────────────────────┐ VPC 10.0.0.0/16 ┌──────────────────────────────────────────────────────────────┐ PUBLIC SUBNETS (10.0.1.0/24, 10.0.2.0/24 — one per AZ) Load Balancer | NAT Gateway | Bastion Host route: 0.0.0.0/0 → Internet Gateway └──────────────────────────────────────────────────────────────┘ ↕ (LB forwards traffic) ┌──────────────────────────────────────────────────────────────┐ PRIVATE SUBNETS (10.0.10.0/24, 10.0.11.0/24 — one per AZ) App Servers / Kubernetes Nodes / API Services route: 0.0.0.0/0 → NAT Gateway (outbound only) └──────────────────────────────────────────────────────────────┘ ↕ (app queries database) ┌──────────────────────────────────────────────────────────────┐ ISOLATED SUBNETS (10.0.20.0/24, 10.0.21.0/24) Databases (RDS, Redis, Elasticsearch) NO route to internet — no NAT, no IGW └──────────────────────────────────────────────────────────────┘ └─────────────────────────────────────────────────────────────────────┘ │ Internet Gateway │ ──────── Internet ────────
Public Subnet Internet-Facing

Has a route to the Internet Gateway. Resources here can have public IPs. Use for: load balancers, NAT gateways, bastion hosts. Never put databases or app servers here.

Private Subnet Outbound Only

Routes outbound traffic through NAT gateway (can reach internet, but internet cannot reach in). Use for: app servers, EKS/GKE nodes, microservices. The right home for compute.

Isolated Subnet No Internet

No route to internet at all — not even NAT. Only reachable from within the VPC. Use for: databases, secrets stores, HSMs. If a DB doesn't need internet, remove the route.

Multi-AZ is not optional: Always spread public and private subnets across at least two Availability Zones. A single-AZ architecture fails entirely when that AZ has a hardware issue. Your load balancers, NAT gateways, and EKS nodes should all exist in ≥2 AZs.
05

Ingress & Egress

// INBOUND VS OUTBOUND — WHO INITIATES

Ingress is traffic flowing into a resource. Egress is traffic flowing out. The distinction matters enormously for security rules: you control what can reach your resources (ingress) and what your resources can reach (egress). In cloud networking, these are configured at the security group, NACL, and routing layer.

← Ingress (Inbound)
  • Traffic arriving at your resource
  • Examples: user HTTP request, SSH connection, DB query from app
  • Security groups check this for ALLOW rules
  • Load balancer → app server is ingress to the app server
  • SYN packet that starts a TCP connection
  • Rate-limited by WAF, DDoS protection
→ Egress (Outbound)
  • Traffic leaving your resource
  • Examples: app calling external API, downloading packages, DNS queries
  • Often overlooked — attackers exploit unrestricted egress for C2/exfil
  • Goes through NAT gateway if resource is in private subnet
  • Filter via firewall, proxy, or DNS allowlisting
  • Lock down to known destinations (egress allowlist)

Internet Gateway vs NAT Gateway

ComponentDirectionWho Has a Public IPUse For
Internet Gateway (IGW) Bidirectional The resource itself must have a public IP Public subnets — load balancers, bastion hosts that need inbound from internet
NAT Gateway Outbound only The NAT gateway has the public IP; private resources stay private Private subnet resources that need to call the internet (e.g., download packages, call APIs)
VPC Endpoint (AWS) Internal only No public IP involved — traffic stays on AWS backbone Access S3, DynamoDB, SSM, STS without internet routing. Cheaper + private.
Private Link Internal only No public IP — interface endpoint in your VPC Access other VPCs or partner services privately across the AWS network.
Egress is an attack vector: Ransomware and malware need outbound connectivity to phone home to C2 servers, exfiltrate data, and pull in additional payloads. If your workloads only need to call specific APIs, restrict egress to exactly those domains/IPs. Use a NAT gateway + security group egress rules or a dedicated egress proxy with allowlisting.
06

Route Tables & Gateways

// HOW THE VPC DECIDES WHERE TO SEND PACKETS

Every subnet in a VPC is associated with a route table — a list of destination CIDR → target mappings. When a packet leaves a resource, the VPC looks up the most specific matching route. The "local" route (your VPC CIDR) is always present and non-deletable.

Route Table Examples — AWS
## Public Subnet Route Table Destination Target ─────────────────── ───────────────────────────────────── 10.0.0.0/16 local # VPC-local traffic stays inside VPC 0.0.0.0/0 igw-xxxxxxxx # Everything else → Internet Gateway # The 0.0.0.0/0 route is what makes this subnet "public" # Resources here can receive inbound if they have a public IP ## Private Subnet Route Table Destination Target ─────────────────── ───────────────────────────────────── 10.0.0.0/16 local 0.0.0.0/0 nat-xxxxxxxx # Outbound → NAT Gateway (then to internet) # No inbound from internet possible. Outbound works through NAT. ## Isolated Subnet Route Table (database) Destination Target ─────────────────── ───────────────────────────────────── 10.0.0.0/16 local # ONLY within VPC. No internet route at all. ## Most specific route wins (longest prefix match) # 10.0.5.0/24 → peering-connection takes precedence over 0.0.0.0/0 → igw # This is how VPC peering and Transit Gateway routes work
VPC Peering VPC-to-VPC

Direct routing between two VPCs (same or different accounts/regions). Add routes in both VPCs pointing to each other's CIDR via the peering connection. Non-transitive — A↔B and B↔C does not give A↔C. Use Transit Gateway for hub-and-spoke topologies.

Transit Gateway Hub-and-Spoke

Central routing hub connecting multiple VPCs, on-premises networks (via VPN/DX), and accounts. Transitive routing — any spoke can reach any other. Centralize egress, inspection, and routing policy. Far simpler than a mesh of peering connections.

07

Security Groups & NACLs

// TWO LAYERS OF FIREWALL — KNOW THE DIFFERENCE

AWS (and equivalents in Azure/GCP) provides two firewall mechanisms at different layers. Security Groups are stateful, virtual firewalls attached to individual resources (ENIs). Network ACLs are stateless, subnet-level guards applied before security groups. They complement each other.

DimensionSecurity GroupNetwork ACL (NACL)
Level Instance / ENI (virtual NIC) Subnet perimeter
Statefulness Stateful — return traffic auto-allowed Stateless — must allow both directions explicitly
Default behavior Deny all inbound, allow all outbound Allow all inbound and outbound (by default)
Rules Allow rules only (no explicit deny) Allow and Deny rules, evaluated in number order
Rule evaluation All rules evaluated; most permissive wins First matching rule wins (numbered, stop-on-match)
Source/Dest IP CIDR or another Security Group ID IP CIDR only
Primary use Instance-level micro-segmentation, app port control Subnet-level subnet-to-subnet blocking, emergency deny

Security Group Best Practices

Security Group Rules — Three-Tier App
## ALB Security Group (public-facing load balancer) Inbound: HTTP TCP 80 0.0.0.0/0 # Allow HTTP from anywhere (redirect to HTTPS) HTTPS TCP 443 0.0.0.0/0 # Allow HTTPS from anywhere Outbound: TCP 8080 sg-app-servers # ONLY to app server security group on app port ## App Server Security Group Inbound: TCP 8080 sg-alb # ONLY from ALB security group — NOT 0.0.0.0/0 TCP 22 sg-bastion # SSH only from bastion security group Outbound: TCP 5432 sg-database # PostgreSQL to DB tier TCP 6379 sg-redis # Redis cache TCP 443 0.0.0.0/0 # HTTPS egress for API calls (restrict further if possible) ## Database Security Group Inbound: TCP 5432 sg-app-servers # ONLY from app tier — never 0.0.0.0/0 TCP 5432 sg-bastion # Allow DBA access through bastion (optional, restrict by user) Outbound: # Databases rarely need outbound. Restrict to nothing or specific monitoring agents. ## Key pattern: source/dest is a Security Group ID, not a CIDR range. ## sg-alb automatically includes all IPs assigned to instances in that group. ## This is dynamic — scales automatically, no manual IP updates.
Never use 0.0.0.0/0 for SSH or RDP: Port 22 and 3389 open to the internet are attacked within seconds of being exposed. Automated scanners run 24/7. Use a bastion host or VPN for SSH access. If you must debug, restrict to your specific IP (/32) and remove the rule immediately after.
08

Kubernetes Network Model

// FLAT POD NETWORK, CNI PLUGINS, NODE IPS

Kubernetes imposes a networking model with three requirements: every Pod gets its own unique IP, all Pods can communicate with all other Pods without NAT, and nodes can communicate with all Pods. This flat model is implemented by the CNI (Container Network Interface) plugin — Cilium, Calico, Flannel, or the cloud-native option (Amazon VPC CNI, GKE Dataplane V2).

Kubernetes network layers
┌─────────────────────────── Node A (10.0.10.5) ─────────────────────────────┐ Pod: payment-api Pod: auth-service Pod: nginx-proxy IP: 10.0.100.5 IP: 10.0.100.6 IP: 10.0.100.7 │ │ │ veth0 veth1 veth2 ← virtual ethernet └────────────────────┴─────────────────────┘ cbr0 / bridge ← CNI bridge eth0 (node NIC) ← VPC ENI / vNIC └─────────────────────────────────────────────────────────────────────────────┘ VPC Network ┌──────────────────┘ ┌─────────────── Node B (10.0.11.5) ──────────────────┐ Pod: database-proxy Pod: worker IP: 10.0.101.3 IP: 10.0.101.4 └─────────────────────────────────────────────────────┘ Cross-node pod communication: payment-api (10.0.100.5) → database-proxy (10.0.101.3) Via CNI overlay (VXLAN/Geneve) or native VPC routing (AWS VPC CNI uses real VPC IPs)
AWS VPC CNI Native

Pods get real VPC IP addresses (from your subnet CIDR). No overlay network — pod-to-pod traffic uses VPC routing directly. Security Groups can be applied per-pod. Requires enough IPs in your subnets — plan for max_pods_per_node × node_count free IPs.

Cilium / Calico Overlay / eBPF

CNI plugins that create a virtual overlay network. Pods get IPs from a pod CIDR (separate from VPC CIDR). Cilium uses eBPF for kernel-level enforcement — fastest dataplane. Both support Kubernetes NetworkPolicy and extended policy (CiliumNetworkPolicy).

kube-proxy vs eBPF: By default, Kubernetes uses kube-proxy (iptables rules) to handle Service IP routing. Cilium in eBPF mode replaces kube-proxy entirely with kernel-level load balancing — faster, more observable, supports network policies at the same layer. Recommended for production EKS clusters.
09

Kubernetes Services & DNS

// STABLE ENDPOINT FOR EPHEMERAL PODS

Pods are ephemeral — they die and respawn with new IPs. A Service provides a stable virtual IP (ClusterIP) and DNS name. Traffic to the service is load-balanced across healthy matching pods. CoreDNS resolves my-service.my-namespace.svc.cluster.local to the ClusterIP.

Service TypeReachable FromHowUse For
ClusterIP Inside cluster only Virtual IP in cluster CIDR (e.g. 172.20.x.x) Internal service-to-service communication. Default type.
NodePort Node IP + port (30000-32767) Forwards node's port to pod Dev/testing only. Exposes a port on every node — don't use in prod.
LoadBalancer Internet or VPC (depends on annotation) Provisions cloud LB (ALB, NLB, GLB) External access. Use with internal: true annotation for VPC-only LBs.
ExternalName Inside cluster only CNAME alias to external DNS name Abstract external dependencies (e.g., RDS endpoint) behind a service name.
Headless Inside cluster only DNS returns pod IPs directly, no ClusterIP StatefulSets, databases with per-pod addressing (e.g., Kafka, Cassandra).
Service YAML — ClusterIP + DNS resolution
apiVersion: v1 kind: Service metadata: name: payment-api namespace: production spec: type: ClusterIP # default — only reachable inside cluster selector: app: payment-api # matches pods with this label ports: - port: 80 # port clients connect to targetPort: 8080 # port the pod actually listens on protocol: TCP --- # DNS names automatically created by CoreDNS: # payment-api (within same namespace) # payment-api.production (cross-namespace short) # payment-api.production.svc # payment-api.production.svc.cluster.local (FQDN) # # A pod in any namespace can call: http://payment-api.production/charge # kube-proxy (or Cilium) rewrites ClusterIP → pod IP + load-balances
10

Kubernetes Ingress & External Access

// GETTING TRAFFIC IN FROM OUTSIDE THE CLUSTER

An Ingress resource defines HTTP/HTTPS routing rules — which hostname and path routes to which Service. The Ingress Controller (nginx, AWS ALB Controller, Traefik, Istio Gateway) watches Ingress resources and programs the actual load balancer or reverse proxy.

Internet
User Browser
DNS
api.example.com
Cloud LB
ALB / NLB (Public)
Ingress Ctrl
nginx / ALB Ctrl
Service
ClusterIP → Pods
Ingress Resource — TLS + Path Routing
apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: main-ingress namespace: production annotations: # AWS ALB Ingress Controller annotations kubernetes.io/ingress.class: alb alb.ingress.kubernetes.io/scheme: internet-facing # or "internal" alb.ingress.kubernetes.io/target-type: ip alb.ingress.kubernetes.io/ssl-redirect: "443" alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:... # Security: restrict source IPs at LB level alb.ingress.kubernetes.io/inbound-cidrs: 0.0.0.0/0 # WAF association alb.ingress.kubernetes.io/wafv2-acl-arn: arn:aws:wafv2:... spec: tls: - hosts: [api.example.com, app.example.com] secretName: tls-cert-secret # cert-manager or ACM rules: - host: api.example.com http: paths: - path: /v1/payments pathType: Prefix backend: service: name: payment-api port: { number: 80 } - path: /v1/auth pathType: Prefix backend: service: name: auth-service port: { number: 80 } - host: app.example.com http: paths: - path: / pathType: Prefix backend: service: name: frontend port: { number: 80 }
Internal Ingress VPC Only

Set scheme: internal on the ALB annotation. The load balancer gets a private IP in your VPC subnets. Only reachable from within the VPC, VPN, or Direct Connect. Used for inter-service APIs or admin tools that should never be internet-accessible.

Service Mesh (Istio/Linkerd) Advanced

Adds a sidecar proxy to every pod for mTLS, traffic shaping, retries, and observability. Use Gateway and VirtualService resources instead of Ingress. Provides deep L7 control: canary deployments, circuit breaking, and identity-aware authorization across all service traffic.

11

Kubernetes Network Policies

// MICRO-SEGMENTATION INSIDE THE CLUSTER

By default, all pods in a Kubernetes cluster can communicate with all other pods. NetworkPolicy resources restrict this — defining exactly which pods can talk to which other pods, on which ports. Without network policies, a compromised pod has unrestricted access to every other pod in the cluster.

Default-allow is insecure: A cluster with no NetworkPolicy applied is flat — any pod can reach any port on any other pod. Start by applying a default-deny policy to every namespace, then explicitly allow only the required flows. Many real-world breaches involved lateral movement between pods that should have been isolated.
NetworkPolicy — Default Deny + Selective Allow
# Step 1: Apply default-deny to a namespace apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: default-deny-all namespace: production spec: podSelector: {} # matches ALL pods in namespace policyTypes: [Ingress, Egress] # No ingress or egress rules = deny everything --- # Step 2: Allow specific flows apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-payment-api-ingress namespace: production spec: podSelector: matchLabels: app: payment-api # applies to payment-api pods policyTypes: [Ingress, Egress] ingress: - from: - podSelector: matchLabels: app: api-gateway # ONLY from api-gateway pod - namespaceSelector: matchLabels: name: production # AND only within production namespace ports: - protocol: TCP port: 8080 egress: - to: - podSelector: matchLabels: app: postgres # to database pods ports: - protocol: TCP port: 5432 - to: # allow DNS resolution - namespaceSelector: matchLabels: kubernetes.io/metadata.name: kube-system podSelector: matchLabels: k8s-app: kube-dns ports: - protocol: UDP port: 53 - protocol: TCP port: 53
12

Least-Port Principle

// ONLY OPEN WHAT IS EXPLICITLY REQUIRED

Every open port and permitted IP range is a potential attack surface. The principle of least access applied to networking: open only the exact ports needed for the exact source ranges that need them. Anything else is denied by default. Document every exception.

Port Allowlisting Framework

PortProtocolAllowed FromJustification Required
443HTTPS0.0.0.0/0 — load balancer onlyPublic web traffic — yes, restricted to LB SG
80HTTP0.0.0.0/0 — load balancer onlyRedirect to HTTPS — yes, restricted to LB SG
22SSHBastion SG onlyAdmin access — via bastion, never from internet
5432PostgreSQLApp server SG onlyDB queries — restricted to app tier SG
6379RedisApp server SG onlyCache — restricted to app tier SG
3306MySQLApp server SG onlyDB queries — never to internet
8080/8443HTTP/S internalLB SG or specific SGApp port — never directly internet-exposed
3389RDPBLOCKED — use SSMNever expose RDP to internet or VPC without PAM
0-65535ALLNEVER 0.0.0.0/0Never allow all ports from anywhere

IP Allowlisting

Source IP Restriction Do This
  • Admin portals: restrict to office IP ranges + VPN (/32 or /24)
  • CI/CD pipelines: restrict to known runner IPs
  • Monitoring agents: restrict to monitoring CIDR
  • Cross-account: restrict to specific account's VPC CIDR
  • Webhook receivers: restrict to vendor's published CIDR list (GitHub, Stripe, etc.)
Common Mistakes Avoid
  • 0.0.0.0/0 on non-HTTP ports — instant scanning target
  • Dev rules left in prod — audit SGs quarterly
  • Wide CIDR for "internal" — use SG references instead of subnets
  • No egress rules — restrict outbound, not just inbound
  • Shared SGs across tiers — each tier gets its own SG
Terraform — Minimal Security Group Example
# Web tier — only HTTPS from internet (via ALB) resource "aws_security_group" "alb" { name = "alb-sg" vpc_id = aws_vpc.main.id ingress { from_port = 443 to_port = 443 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] # HTTPS public } ingress { from_port = 80 to_port = 80 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] # HTTP → redirect to HTTPS } egress { from_port = 8080 to_port = 8080 protocol = "tcp" security_groups = [aws_security_group.app.id] # only to app tier } } # App tier — only from ALB resource "aws_security_group" "app" { name = "app-sg" vpc_id = aws_vpc.main.id ingress { from_port = 8080 to_port = 8080 protocol = "tcp" security_groups = [aws_security_group.alb.id] # only from ALB SG } egress { from_port = 5432 to_port = 5432 protocol = "tcp" security_groups = [aws_security_group.db.id] # only to DB } egress { from_port = 443 to_port = 443 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] # HTTPS egress for external APIs } } # DB tier — only from app tier resource "aws_security_group" "db" { name = "db-sg" vpc_id = aws_vpc.main.id ingress { from_port = 5432 to_port = 5432 protocol = "tcp" security_groups = [aws_security_group.app.id] # only from app } # No egress rules = no outbound traffic at all }
13

Bastion Hosts

// SECURE JUMP SERVER FOR ADMIN ACCESS

A bastion host (jump server) is a hardened, internet-facing instance whose sole purpose is to be the single entry point for administrative SSH/RDP access into your private network. Instead of exposing all your servers to SSH from the internet, you expose only the bastion — a small, auditable attack surface. All admin access flows through it.

Bastion host access pattern
Admin (your laptop) │ SSH to bastion public IP ┌──────────────── Public Subnet ────────────────────────────────────┐ Bastion Host Public IP: 1.2.3.4 Private IP: 10.0.1.10 SG: allow SSH from YOUR_IP/32 only Hardened OS, no other services, session logging enabled └──────────────────────────────────────────────────────────────────┘ │ SSH hop (ProxyJump / -J flag) ┌──────────────── Private Subnet ──────────────────────────────────┐ App Server DB Server K8s Node SG: allow SSH only from bastion SG No public IPs. Not reachable from internet. └──────────────────────────────────────────────────────────────────┘

Bastion Hardening Checklist

Bastion Configuration Required
  • Minimal OS — no unnecessary packages or services
  • SSH key-based auth only — disable password authentication
  • MFA on SSH login (e.g., google-authenticator PAM module)
  • Restrict inbound SSH to known IPs only (/32 of your office/VPN)
  • No internet egress — bastion doesn't need outbound internet
  • Automatic session logging (audit trail of all commands)
  • No persistent local credentials — use IAM roles or certificate auth
  • Auto-patch via SSM or unattended-upgrades
Modern Alternatives Preferred
  • AWS SSM Session Manager — browser/CLI shell into private instances with no SSH port open at all. Uses IAM auth, full audit log, no bastion needed.
  • Teleport — open-source bastion replacement with SSO, RBAC, session recording, and K8s access proxy.
  • Boundary (HashiCorp) — dynamic credential injection, just-in-time access, identity-based tunneling.
  • ZTNA / Client VPN — give engineers network-level access to the VPC, then connect directly. No jump host needed.
SSH Config — ProxyJump through Bastion
# ~/.ssh/config — client-side configuration # Bastion host definition Host bastion-prod HostName 1.2.3.4 # bastion public IP User ec2-user IdentityFile ~/.ssh/prod_key.pem ServerAliveInterval 60 # Private instances — jump through bastion automatically Host 10.0.*.* User ec2-user IdentityFile ~/.ssh/prod_key.pem ProxyJump bastion-prod # transparently tunnels through bastion # Usage: just ssh directly to private IP # $ ssh 10.0.10.25 ← works! tunnels through bastion automatically # Port forwarding for database access # $ ssh -L 5432:my-rds.cluster.us-east-1.rds.amazonaws.com:5432 bastion-prod # Then: psql -h 127.0.0.1 -p 5432 -U postgres ← connects via bastion tunnel --- # AWS SSM alternative (no open ports required) # $ aws ssm start-session --target i-0abc1234def56789 # Tunnel a port via SSM (no SSH port 22 needed at all): # $ aws ssm start-session --target i-0abc123 \ # --document-name AWS-StartPortForwardingSessionToRemoteHost \ # --parameters '{"host":["my-rds.cluster.rds.amazonaws.com"],"portNumber":["5432"],"localPortNumber":["5432"]}'
14

Private-Only Architecture

// MAXIMUM ISOLATION — ZERO PUBLIC EXPOSURE

The gold standard for sensitive workloads: no resources have public IPs, no inbound traffic from the internet, all access is through VPN or private endpoints. Traffic between resources and AWS services (S3, STS, ECR) goes through VPC Endpoints — never touching the internet.

Private-only EKS cluster — zero public exposure
Internet ▼ (developers only, through VPN) ┌───────────── VPC 10.0.0.0/16 ────────────────────────────────────────────┐ Client VPN Endpoint ← devs connect here, gets 10.0.200.x/16 IP ┌──────────────────────────────────────────────────────────────────┐ Private Subnet A (10.0.10.0/24) Private Subnet B (10.0.11.0/24) EKS Nodes + Pods EKS Nodes + Pods Internal ALB only Internal ALB only └──────────────────────────────────────────────────────────────────┘ │ VPC Endpoints (no internet needed) S3 Endpoint │ ECR Endpoint │ SSM Endpoint │ STS Endpoint ┌──────────────────────────────────────────────────────────────────┐ Isolated DB Subnet (10.0.20.0/24) RDS │ ElastiCache │ OpenSearch NO internet route └──────────────────────────────────────────────────────────────────┘ └───────────────────────────────────────────────────────────────────────────┘ No IGW attached. No NAT Gateway. No public IPs on any resource. EKS API server endpoint: private only (no public endpoint).
VPC Endpoints to Know AWS
  • com.amazonaws.region.s3 — S3 Gateway endpoint (free)
  • com.amazonaws.region.ecr.api — pull container images privately
  • com.amazonaws.region.ecr.dkr — ECR Docker registry
  • com.amazonaws.region.ssm — SSM agent communication
  • com.amazonaws.region.ec2messages — SSM run command
  • com.amazonaws.region.sts — assume IAM roles privately
  • com.amazonaws.region.secretsmanager — fetch secrets
  • com.amazonaws.region.logs — CloudWatch Logs
Private EKS API Server Hardened
  • Set endpointPrivateAccess: true, endpointPublicAccess: false
  • kubectl from within VPC only (or VPN)
  • CI/CD pipelines must be inside VPC or use VPN/transit gateway
  • Enable audit logs to CloudWatch — every API call logged
  • Restrict authorized networks even on private endpoint
  • Use IRSA (IAM Roles for Service Accounts) — no node-level access keys
15

Quick Reference

// DECISION TREES, COMMON PATTERNS, PORTS

Architecture Decision Tree

Should this resource have a public IP?
Does it need to RECEIVE traffic from the internet? ├── YES → Does it need to handle requests directly? │ ├── YES (e.g., you're building a CDN edge node) │ │ → Public subnet + public IP + security group restricting to port 443/80 only │ └── NO (most apps) │ → Put a Load Balancer in front. LB gets public IP, app does NOT. │ App server: private subnet, no public IP, SG only allows LB SG. │ └── NO → Private subnet ├── Needs to call the internet (npm install, API calls)? │ └── YES → NAT Gateway in public subnet + private subnet route 0.0.0.0/0 → NAT │ ├── Only talks to other VPC resources? │ └── YES → Isolated subnet. No internet route at all. │ Use VPC Endpoints for AWS services (S3, SSM, etc.) │ └── Database / cache / secrets? └── ALWAYS isolated subnet. Never internet-accessible. Ever.

Essential Ports Reference

PortServiceOpen to Internet?Notes
22 SSH NEVER Use bastion/SSM. Source: bastion SG only.
80 HTTP LB ONLY Only on load balancer. Redirect to 443.
443 HTTPS LB ONLY Only on load balancer. TLS terminates here.
3306 MySQL NEVER App SG only. Isolated subnet.
5432 PostgreSQL NEVER App SG only. Isolated subnet.
6379 Redis NEVER App SG only. Isolated subnet.
27017 MongoDB NEVER App SG only. Isolated subnet.
8080 HTTP Alt NEVER Internal app port. LB SG → app SG only.
2379/2380etcd (K8s) NEVER Control plane internal only.
6443 K8s API Server PRIVATEVPN/bastion access only. Never public.
3389 RDP NEVER Use SSM Fleet Manager for Windows.
53 DNS (UDP/TCP) INTERNALVPC resolver. Block external DNS on nodes; force through VPC DNS.

CIDR Quick Math

CIDR Reference
# IPs per CIDR: 2^(32-prefix) /32 = 1 single host or IP allow rule /30 = 4 (2 usable) point-to-point links /28 = 16 (11 usable in AWS) bastion / NAT GW subnet /27 = 32 (27 usable in AWS) very small service subnet /26 = 64 (59 usable in AWS) /25 = 128 (123 usable in AWS) /24 = 256 (251 usable in AWS) ← standard application subnet /23 = 512 (506 usable in AWS) /22 = 1,024 (1,019 usable in AWS) /21 = 2,048 /20 = 4,096 ← AWS default per-AZ subnet /18 = 16,384 /16 = 65,536 ← standard VPC size (AWS default) /15 = 131,072 /8 = 16,777,216 ← entire 10.x.x.x space # Cloud-specific IP reservations per subnet (cannot use): # AWS: 5 IPs (network, router, DNS, future use, broadcast) # Azure: 5 IPs (network, gateway, DNS×2, broadcast) # GCP: 4 IPs (network, gateway, broadcast + 1 reserved)
Starting point VPC design: Use 10.x.0.0/16 for the VPC (65k IPs). Allocate /24 subnets for app and DB tiers. Allocate /28 subnets for NAT gateways and bastion hosts (they only need a few IPs). Reserve 10.x.128.0/17 for future growth. Leave adjacent /16 blocks free if you'll ever need VPC peering — overlapping CIDRs cannot peer.

Networking Checklist — Pre-Production

Pre-Production Networking Audit
## VPC & Subnet VPC CIDR doesn't overlap with on-prem or other VPCs (for future peering) Public / private / isolated subnets in at least 2 AZs each Isolated subnets have NO route to internet (no NAT, no IGW) Route tables explicitly reviewed — no accidental 0.0.0.0/0 on isolated subnet ## Security Groups No 0.0.0.0/0 on port 22, 3389, or any database port SG sources use SG references (not CIDRs) where possible Each tier has its own SG (ALB, app, DB — separate) Egress rules are restricted (not "allow all outbound") No unused SGs with broad rules ## Kubernetes Nodes in private subnets only API server endpoint: private only default-deny NetworkPolicy applied to all namespaces No NodePort services exposed in production Ingress controller uses internal or public LB (intentionally chosen) Pod security groups configured (if using VPC CNI) ## Access Admin SSH access only via bastion or SSM (no direct internet SSH) Bastion SG restricted to specific IP ranges Database access via bastion/SSM tunnel only VPC Endpoints configured for required AWS services No hard-coded credentials — IAM roles / IRSA for all workloads