Multi-Cloud Architecture
& Scaling Standards
Production-ready reference guidance for scalable, secure systems across AWS, Azure, and Google Cloud Platform — written for engineers and technical leads who need consistent, defensible architecture decisions.
Core Principles
These principles apply universally, regardless of which pattern you choose. They represent the minimum standard for any production system and should be enforced at the architecture review stage, not retrofitted later.
Assume every component can fail. Build retry logic, circuit breakers, graceful degradation, and health probes into every layer — not as an afterthought.
Managed services reduce operational surface area and patch burden. Choose them when they materially reduce toil without introducing unacceptable vendor lock-in.
Horizontal scaling is more resilient and cost-effective than vertical. Reserve vertical scaling for transactional databases where single-node performance is the bottleneck.
Internet-facing ingress must never route directly to compute or data systems. Public subnets are for edge components only — load balancers, approved bastions.
Platform-native identities, short-lived credentials, TLS everywhere, and centralized secrets management are non-negotiable baselines across all cloud patterns.
Choose the simplest architecture the team can operate safely under incident conditions. Fashionable complexity that cannot be debugged at 3am is an operational liability.
Architecture Picker
The Classic 3-Tier Web Architecture
The 3-tier model separates presentation, application logic, and persistence into distinct layers with independent scaling, security boundaries, and deployment controls. It remains the most reliable foundation for enterprise systems — proven, predictable, and operationally straightforward.
- Deploying a monolithic or near-monolithic codebase
- Performing lift-and-shift from on-premises environments
- The team prefers VM-level control over containers/serverless
- Stateful application behaviour requires OS-level configuration
- Operations team is comfortable with traditional server management
- Rapid independent service deployment is the primary requirement
- Multiple teams need to own separate release lifecycles
- The codebase is already cleanly decomposed into bounded contexts
- Traffic is highly spiky and elastic scaling must reach zero
Scaling Strategy
Place web and app tiers in separate auto-scaling pools. Scale independently on CPU, memory, request count, queue depth, or P99 latency.
Route read-heavy traffic to read replicas. Direct all writes to the primary. Scale reads horizontally before scaling the primary vertically.
Externalise session state to Redis or Memcached. Stateless instances scale freely without session affinity problems.
Define predictive and reactive triggers. Pre-scale for known events. Use scheduled scaling for known daily traffic curves.
Cloud-Specific Service Mappings
| Layer | AWS | Azure | GCP |
|---|---|---|---|
| Public ingress | Application Load Balancer | Application Gateway | Cloud Load Balancing |
| Web tier compute | EC2 Auto Scaling Group | VM Scale Sets | Managed Instance Group |
| App tier compute | EC2 ASG | VM Scale Sets | Managed Instance Group |
| Managed SQL | RDS Aurora | Azure SQL SQL MI | Cloud SQL |
| Read replicas | RDS Read Replica Aurora Replica | SQL geo-replica | Cloud SQL replica |
| Session cache | ElastiCache (Redis) | Azure Cache for Redis | Memorystore |
| Secrets & keys | KMS Secrets Manager | Key Vault | Cloud KMS Secret Manager |
Containerized Microservices on Kubernetes
Microservices on Kubernetes give independent teams the ability to release, scale, and own their services without coordinating monolith deployments. Each service has its own scaling policy, runtime, and often its own data boundary — connected through a shared ingress and internal service mesh.
- Multiple teams need independent release cadences
- Domain boundaries are well understood and stable
- Polyglot runtimes (Go, Python, JVM, Node) are required
- Team has operational maturity for distributed systems
- Service-level ownership and SLA accountability matter
- Small team without Kubernetes operations experience
- Domain boundaries are poorly defined or contested
- Distributed tracing and service contract discipline is absent
- Simplicity is more important than independent deployability
Scaling Strategy
Scale pod counts per-service on CPU, memory, custom metrics (RPS, queue lag, latency P95). Configure per-deployment — not globally.
Adds or removes worker nodes when the scheduler cannot place pods. Works alongside HPA to ensure the cluster always has headroom for rapid scale-out.
Define minimum available replicas to ensure safe rolling deployments and autoscaler node drains without service interruption.
Set CPU and memory requests accurately — used for scheduling decisions. Set limits to contain noisy-neighbour impact. Mis-set limits cause OOMKills.
Cloud-Specific Service Mappings
| Capability | AWS | Azure | GCP |
|---|---|---|---|
| Managed Kubernetes | Amazon EKS | AKS | GKE (Autopilot) |
| Ingress / API entry | AWS LBC API Gateway | AGIC API Management | GKE Ingress API Gateway |
| Container registry | ECR | Azure Container Registry | Artifact Registry |
| NoSQL data store | DynamoDB | Cosmos DB | Firestore Bigtable |
| Distributed cache | ElastiCache | Azure Cache for Redis | Memorystore |
| Metrics / autoscale | CloudWatch Prometheus | Azure Monitor | Cloud Monitoring |
| Service mesh | App Mesh | Open Service Mesh | Cloud Service Mesh |
Serverless & Event-Driven Architecture
Serverless event-driven systems decouple producers from consumers through a durable event bus or messaging layer. Producers emit events without knowing who processes them. Consumers scale independently according to event volume, with zero idle cost when quiet.
- Traffic is spiky, bursty, or highly unpredictable
- Workflows are naturally asynchronous and tolerate delay
- Minimising idle infrastructure cost is a primary concern
- Integration glue, automation, and back-office processing
- Rapid prototyping with low operational ownership
- Workloads are long-running (>15 min without checkpointing)
- Ultra-low latency is a hard requirement (cold starts matter)
- Complex distributed transactions require strong consistency
- Debugging and tracing experience is weak on the team
Scaling Strategy
No traffic → no running instances → no idle cost. Functions spin up on demand. Provision Concurrency (AWS) or pre-warming eliminates cold starts for critical paths.
Event buses absorb traffic spikes. Downstream functions process at their own rate. No cascading overload — each consumer scales to its own subscription pressure.
Set reserved or maximum concurrency limits to protect downstream services and databases from thundering-herd effects during burst events.
Failed events go to a DLQ for inspection and replay. Never silently discard events. Monitor DLQ depth as a key operational metric.
Cloud-Specific Service Mappings
| Capability | AWS | Azure | GCP |
|---|---|---|---|
| API entry | API Gateway | API Management | API Gateway |
| Serverless compute | Lambda | Azure Functions | Cloud Functions Cloud Run |
| Event router / topic | EventBridge SNS SQS | Event Grid Service Bus | Pub/Sub |
| Workflow orchestration | Step Functions | Durable Functions Logic Apps | Workflows |
| Dead-letter / retry | SQS DLQ | Service Bus DLQ | Pub/Sub dead-letter topic |
Global High Availability — Multi-Region
Multi-region architecture protects against catastrophic regional failure and reduces latency for globally distributed users. It requires strong discipline around failover automation, replication lag tolerance, consistency models, and incident runbooks — the most complex pattern in this handbook.
- Application is Tier 0 — regional downtime is unacceptable
- Users are globally distributed and latency matters materially
- Compliance or resilience objectives require demonstrable DR
- Business continuity requires RTO < 15 min and RPO < 1 min
- You have the team to operate and practice failover regularly
- Single-region zonal redundancy is sufficient for the SLA
- The team has never tested failover under real incident conditions
- Replication lag and eventual consistency are unacceptable
- Budget cannot support duplicate infrastructure running continuously
Cloud-Specific Service Mappings
| Capability | AWS | Azure | GCP |
|---|---|---|---|
| Global traffic routing | Route 53 Global Accelerator | Front Door Traffic Manager | Global Cloud LB Cloud DNS |
| Regional compute | EC2 / ECS / EKS | VMSS / AKS / App Service | MIG / GKE / Cloud Run |
| Multi-region database | DynamoDB Global Tables Aurora Global DB | Cosmos DB SQL geo-replication | Cloud Spanner AlloyDB |
| Edge WAF & DDoS | WAF + Shield | WAF + DDoS Protection | Cloud Armor |
| Observability | CloudWatch X-Ray | Monitor + App Insights | Cloud Monitoring + Trace |
Enterprise Network & Security Standards
These standards apply to every architecture above. They represent the minimum acceptable security posture for any production system. Gaps here are not technical debt — they are active risk that must be tracked and remediated on a fixed timeline.
Network Isolation Model
Security Standard Areas
Security Controls by Cloud
| Control Area | AWS | Azure | GCP |
|---|---|---|---|
| Network boundary | VPC | VNet | VPC |
| WAF | AWS WAF | Azure WAF | Cloud Armor |
| DDoS protection | Shield Standard / Advanced | DDoS Protection Basic / Std | Cloud Armor + Google Edge |
| Identity standard | IAM Roles | Managed Identities | Service Accounts / Workload Identity |
| Key management | KMS | Key Vault | Cloud KMS |
| Secrets management | Secrets Manager | Key Vault Secrets | Secret Manager |
| Threat detection | GuardDuty | Microsoft Defender for Cloud | Security Command Center |
| Policy as code | AWS Config + SCP | Azure Policy | Org Policy Service |
Observability Standards
A system you cannot observe cannot be operated safely under incident conditions. Observability is not a "nice to have" — it is a first-class architecture concern applied at design time, not retrofitted after incidents.
Instrument all services with RED metrics: Request rate, Error rate, Duration (latency). Expose in Prometheus format or cloud-native metrics. Set alert thresholds on SLOs, not averages.
All logs must be structured (JSON). Include: correlation ID, service name, environment, trace ID, user context (anonymised). Never log secrets or PII.
Propagate trace context (W3C Trace Context) across all service boundaries. Required for microservices and serverless. Essential for diagnosing latency in call chains.
Expose /health/live (process alive) and /health/ready (dependencies healthy). Load balancers and orchestrators use readiness to gate traffic routing decisions.
Define SLOs for error rate and latency percentiles. Alert on error budget burn rate, not raw counts. High burn rate ≠ page if SLO is healthy; SLO breach always pages.
Maintain a service dashboard per architecture tier: request volume, error rate, P50/P95/P99 latency, saturation, and dependency health. Visible to the oncall team without digging.
| Capability | AWS | Azure | GCP |
|---|---|---|---|
| Metrics & dashboards | CloudWatch | Azure Monitor | Cloud Monitoring |
| Distributed tracing | X-Ray | Application Insights | Cloud Trace |
| Log aggregation | CloudWatch Logs | Log Analytics | Cloud Logging |
| Managed Prometheus | Amazon Managed Prometheus | Azure Managed Prometheus | Cloud Monitoring (Prometheus) |
Common Anti-Patterns
These are the most common architecture mistakes that reach production. Each one has a recognisable failure signature. Identifying them early in design review avoids incidents and costly refactors.
One user request synchronously traverses five or six microservices before returning a response. Each hop adds latency, retry pressure, and a new failure point.
- Latency compounds on every hop — 6 × 50ms = 300ms minimum
- Retry storms amplify downstream instability under load
- One degraded service cascades failure across the chain
- Distributed ownership makes debugging extremely hard
- Timeouts must be tuned for every pair, not just the edge
- Collapse overly chatty service boundaries into fewer, cohesive services
- Use async events where real-time coupling is unnecessary
- Apply circuit breakers, bulkheads, and timeouts at every boundary
- Use the Backend-for-Frontend pattern to aggregate at the edge
- Instrument every hop so latency attribution is visible
A database is reachable from the public internet, either intentionally through a public IP or accidentally through a misconfigured firewall or permissive security group.
- Dramatically expands the attack surface
- Credential leakage is immediately exploitable from anywhere
- Bypasses layered network security assumptions
- Creates audit, compliance, and regulatory exposure
- Breach notification requirements are triggered on compromise
- Keep all databases in private subnets — no public IP addresses
- Access only through application services or approved bastion
- Use private endpoints / VPC peering for cross-service access
- Enforce with SCPs, Azure Policy, or Org Policy constraints
- Run automated scanning to detect public resources
Infrastructure is sized for Black Friday every day. Spikes that occur a few times a year drive permanent baseline capacity decisions.
- Constant idle spend on underutilised capacity
- Delays adoption of elastic design patterns
- Often reflects missing load tests or poor traffic forecasting
- Teams accept over-provisioning rather than solve the hard problem
- Instrument actual utilisation before sizing baseline capacity
- Use auto-scaling groups, HPA, managed instance groups
- Queue-based burst absorption for bursty ingest paths
- Reserve steady-state baseline only; let burst expand elastically
- Run load tests and define data-driven scaling triggers
Services are split before domain boundaries are understood. The result is a distributed monolith — all the operational complexity of microservices with none of the autonomy.
- Tight coupling between services creates shared deployment dependencies
- Teams cannot release independently — the core problem remains unsolved
- Operational complexity increases with no autonomy benefit
- Data boundaries are unclear — shared database persists across "services"
- Start with a modular monolith — well-structured internal modules
- Extract services only when a domain boundary is stable and well-understood
- Each extracted service owns its own data store exclusively
- Validate that teams can deploy the service independently before extraction
RTO / RPO Target Matrix
Recovery Time Objective (RTO) and Recovery Point Objective (RPO) drive the architecture pattern and database replication strategy. Define these at design time with the business — not during an incident.
Architecture Checklist
Click each item to track your review progress for a given architecture.
-
Simplest viable architecture selected — chosen because of workload characteristics and team maturity, not trend pressure.
-
Data plane is private — no databases, message brokers, or internal services have public IP addresses.
-
WAF and DDoS protection — applied at every internet-facing endpoint, not only production.
-
TLS 1.2+ enforced everywhere — client-to-service and service-to-service. TLS termination only at approved ingress points.
-
Platform-native identity in use — IAM roles, managed identities, or service accounts. No static access keys in code, images, or configs.
-
All secrets in a secrets manager — AWS Secrets Manager, Azure Key Vault, or GCP Secret Manager. None in environment files or repos.
-
Horizontal elasticity in every tier — web, app, and where possible data layers scale on measured load, not assumptions.
-
Failure domain design validated — system tolerates loss of a single instance, a zone, and (for Tier 0/1) a full region.
-
RTO and RPO targets defined with business stakeholders — and the architecture demonstrably meets them.
-
DR / failover tested — not assumed. Recovery procedures are documented and have been exercised in the last 90 days (Tier 0/1).
-
Observability is complete — metrics (RED), structured logs, distributed traces, health probes, and alert policies are live in all environments.
-
Anti-pattern review done — pinball chains, public databases, over-provisioning, and premature service extraction checked and clear.