Patterns,
Anti-Patterns
& the Art of Choice
A practitioner's guide to the major architectural styles — when to reach for each, the traps each one hides, and the evolving industry consensus on what actually works at scale.
Overview
Architecture decisions are the ones that are hardest to reverse. Choosing how to decompose your system — how it communicates, how data flows, how teams are organised around it — shapes every engineering decision that follows. Get it wrong and you spend years fighting your own infrastructure instead of building product.
There is no universally correct architecture. Every pattern is a set of tradeoffs. The goal of this handbook is to make those tradeoffs explicit so your team can make deliberate, informed choices instead of defaulting to the industry fashion of the moment.
Every architectural decision spends from your team's complexity budget. Distributed systems, eventual consistency, and service meshes are expensive. Spend wisely.
Your architecture will mirror your team structure — and vice versa. You can't deploy a microservices architecture with a 5-person team and expect it to work.
Prefer architectures you can migrate out of. A monolith that's well-structured can be extracted into services later. A distributed mess cannot easily be collapsed.
How to Choose an Architecture
| Question | Lean Simpler | Lean Distributed |
|---|---|---|
| Team size | 1–25 engineers | 25–100+ engineers |
| Domain clarity | New / exploratory product | Well-understood, stable domain |
| Deployment frequency | Weekly or less | Multiple times per day per team |
| Scale requirements | <10k RPS, single region | Global, massive, independent scale |
| Regulatory isolation | Shared data model is fine | Hard data boundaries required |
| Technology diversity | Single language/stack is fine | Legitimately need different runtimes |
| Operational maturity | Small or growing ops team | Mature platform / SRE capability |
| Time to market | Speed is #1 priority now | Long-term autonomy matters more |
The Monolith
All application logic — UI, business logic, data access — lives in a single deployable unit. One codebase, one process, one deployment. Shares a single database. Despite its reputation as "legacy," a well-written monolith outperforms a poorly-written distributed system in almost every practical dimension.
- Small team (under 15 engineers)
- Exploring product-market fit
- Domain is unclear or evolving
- Limited operational resources
- Need fast iteration cycles
- Vertical scaling is sufficient
- Teams stepping on each other daily
- Parts need wildly different scale
- Deployment takes hours to release
- Multiple teams own same codebase
- Technology diversity is required
- The Big Ball of Mud anti-pattern
- Shared database becoming a bottleneck
- Test suite growing slower over time
- Long release cycles start to hurt
- Cross-cutting concerns tangled
Monolith Anti-Patterns
No discernible structure. Everything imports everything. Business logic in controllers, database calls in view templates, validation spread across the codebase. Common result of adding features rapidly without maintaining layer discipline.
Multiple applications or services talk to the same database directly, including updating each other's tables. Schema changes become impossible without coordinating every consumer.
Domain objects are just data holders with no behaviour. All business logic lives in service classes that orchestrate dumb models. Results in scattered, duplicated business rules that are hard to test and reason about.
Order should know how to calculate its total and validate its own state — not delegate that to a 500-line OrderService.Modular Monolith
A single deployable unit — like a monolith — but with strong, enforced internal module boundaries. Each module owns its domain, its data, and its public API surface. Modules communicate through well-defined interfaces, not by reaching into each other's internals. Deployed as one unit; structured like services. You get the simplicity of a monolith with the logical isolation of services — and the ability to extract true services when you genuinely need to.
- Team of 5–50 engineers
- Domain reasonably well understood
- Want service-like isolation without ops overhead
- May need to extract services later
- Single-region deployment is fine
- Growth stage: 1M–50M users
- Module boundaries enforced at build time
- Each module has its own DB schema/tables
- Cross-module calls via public interfaces only
- No reaching into another module's internals
- Modules can be tested in isolation
- Use ArchUnit / Dependency Cruiser to enforce
- Start with well-defined module boundaries
- Add messaging abstraction inside the monolith
- When a module needs to scale independently — extract
- The interface stays the same, delivery changes
- You'll know exactly where boundaries belong
Microservices
The system is decomposed into small, independently deployable services — each owned by a small team, each with its own database, each communicating over a network (HTTP/gRPC/messaging). Enables independent scaling and deployment per service. Requires mature platform engineering, service discovery, distributed tracing, and organisational alignment to actually succeed.
- 50+ engineers with clear team ownership
- Parts truly need independent scale (e.g. payments vs. search)
- Regulatory isolation requirements
- Different technology stacks legitimately needed
- Mature DevOps/Platform Engineering org
- Netflix, Amazon, Uber scale
- True team independence & autonomy
- Independent deployment — no coordination
- Independent scaling per service
- Fault isolation (one service fails, others don't)
- Technology freedom per service
- Distributed transactions are hard
- Network latency is now everywhere
- Observability requires investment
- Service discovery and mesh overhead
- Testing integration paths is complex
- Operational overhead multiplied by N
Microservice Anti-Patterns
Services that are physically separate but logically coupled — they call each other synchronously in a chain, share a database, or must be deployed together. You get all the costs of distribution with none of the benefits. The worst of both worlds.
Services so small they have no business logic — just CRUD wrappers around a table. 50 services for a 10-person team. Operational burden is massive; every feature requires coordinating 5 service deployments.
Service A calls B, which calls C, which calls D. Latency compounds, failure cascades. One slow service makes everything slow. P99 latency of the chain is sum of all P99s.
A giant shared library ("commons" / "core") that all services depend on. Updating the library requires coordinating and re-deploying all services. The services are no longer independently deployable.
Event-Driven Architecture
Components communicate by producing and consuming events through a message broker (Kafka, RabbitMQ, AWS EventBridge, NATS). Producers don't know who consumes their events. Consumers subscribe to events they care about. Enables temporal decoupling, high throughput, and extensibility — add a new consumer without touching the producer. Works beautifully within a modular monolith (in-process events) or across microservices (external broker).
- High-throughput pipelines (analytics, logs)
- Fan-out: one event, many consumers
- Workflows with long-running steps
- Audit logs / event sourcing
- Real-time data processing
- Decoupling legacy systems
- Domain Events — something happened
- Commands — requests to do something
- Sagas — distributed long-running transactions
- Outbox Pattern — reliably publish events
- Consumer Groups — competing consumers
- Dead Letter Queues — failed message handling
- Event schema evolution is hard
- Debugging async flows is complex
- Eventual consistency surprises users
- Message ordering assumptions
- At-least-once delivery = idempotency required
- "Event spaghetti" — undocumented flows
Event-Driven Anti-Patterns
Applying Event Sourcing to every entity in the system because it sounds elegant. Event Sourcing has real operational complexity: rebuilding state from events, projections, schema migration of historical events. Most CRUD entities don't benefit.
Events that contain the entire entity payload — every field every time. Consumers become dependent on the event schema in the same way as a shared database. Any field addition or removal breaks consumers.
Message brokers guarantee at-least-once delivery. If your consumer isn't idempotent — processing the same message twice produces the same result — you will double-charge customers, double-send emails, or corrupt state.
Serverless Architecture
Functions-as-a-Service (Lambda, Cloud Functions, Azure Functions) triggered by events. No servers to manage; auto-scales from zero to millions; pay only for actual execution time. Ideal for irregular workloads, webhooks, and event pipelines. Cold starts, execution time limits, and vendor lock-in are real considerations.
- Webhooks & event processors
- Scheduled/cron jobs
- Image/video processing pipelines
- Low-traffic APIs with spiky traffic
- Glue code between services
- Background job processing
- Long-running computations (>15min)
- Latency-sensitive APIs (cold starts)
- Stateful workloads
- High sustained throughput (cost)
- Complex local development
- When vendor lock-in is unacceptable
- Monolithic Lambda — 1 function handles everything
- Lambda calling Lambda synchronously
- Keeping database connections open in Lambda
- Not handling cold starts for user-facing paths
- Missing Dead Letter Queues for failures
Backend for Frontend (BFF)
The BFF pattern, coined by Sam Newman, solves a specific problem: a general-purpose API designed for all clients optimises for none of them. A mobile app needs different data shapes, payload sizes, and endpoints than a web app or a third-party partner API. The BFF introduces a dedicated API layer per client type.
- Mobile BFF returns compact payloads; Web BFF returns full data
- Each BFF team is owned by the client team — no upstream negotiation
- Aggregates/transforms data from multiple backend services
- Handles client-specific auth flows and session management
- Can evolve independently of backend services
- One BFF for all clients — defeats the purpose entirely
- Business logic in BFF — BFFs should aggregate, not implement rules
- BFF calling BFF — creates coupling between client layers
- Too many BFFs — one per team is a smell; one per client type is right
- BFF becomes a God Service — pulls too much logic from downstream
CQRS & Event Sourcing
Separate the model used for writes (Commands) from the model used for reads (Queries). Commands mutate state; Queries return data. Read models are optimised for their specific query pattern — denormalised, pre-computed, fast. Write models enforce business invariants. Often combined with Event Sourcing, where the event log is the source of truth and read models are projections.
- Read/write ratio is very asymmetric
- Queries need data in complex shapes
- Need full audit history (Event Sourcing)
- Complex domain with many invariants
- Scaling reads independently from writes
- Simple CRUD domains
- Small teams — overhead is high
- Users expect immediate consistency
- Domain logic is not complex enough to warrant it
- Sharing a write model with reads
- Querying command handlers
- Applying CQRS to every entity, not just complex ones
- Event schema is too fine-grained, breaking consumers
Layered & Hexagonal Architecture
Classic horizontal layers: Presentation → Application → Domain → Infrastructure. Dependencies flow downward only. Simple, familiar, and effective when discipline is maintained.
- Presentation: HTTP controllers, GraphQL resolvers
- Application: Use cases, orchestration, no business rules
- Domain: Entities, value objects, domain services
- Infrastructure: DB, external APIs, email, file storage
Controllers calling the repository directly, bypassing the domain. One layer should never know about layers more than one step away.
The domain is at the centre with zero dependencies on infrastructure. External systems (databases, APIs, message brokers) are adapters that plug into defined ports. The domain is fully testable in isolation — no database required.
- Ports: Interfaces defined by the domain
- Adapters: Implementations (SQL, REST, Kafka)
- Domain: No framework, no ORM, no HTTP imports
- Swap database, don't touch domain code
Domain entities that import ORM annotations, HTTP status codes, or logging frameworks. The domain must be framework-agnostic.
General Architectural Anti-Patterns
Building for 10 million users when you have 100. Designing for failure modes you haven't encountered. Choosing Kafka because it's impressive, not because you need it. Engineering time spent on infrastructure that doesn't move the product forward.
Choosing technologies to pad CVs rather than to solve actual problems. Using Kubernetes, Kafka, GraphQL, and gRPC because they look good — when a simple REST API on a single VM would serve the actual product perfectly.
Using the same architectural pattern for every problem because it worked before. Applying microservices to a 3-person internal tool because "we used it at my last job." Every tool has a problem it's optimised for.
Components that should be independent are secretly entangled through shared global state, shared database tables, implicit ordering assumptions, or hard-coded service URLs. Changes ripple unexpectedly.
Multiple services or modules writing to the same entity. No single authoritative source of truth for customer data, product data, or order state. Synchronisation logic becomes the most complex part of the system.
Every architectural decision requires sign-off from every team. Meetings to decide how to name HTTP routes. No individual is accountable for outcomes. Results in paralysis, bland compromises, and inconsistent implementation.
The Microservice Complexity Tax
Microservices don't eliminate complexity — they redistribute it from code to infrastructure and organisational processes. Before adopting them, your team must be ready to pay the following taxes:
| Problem | Monolith | Microservices | Required Solution |
|---|---|---|---|
| Transaction | DB transaction | Distributed | Saga pattern, 2PC, eventual consistency |
| Debugging | Stack trace | Across services | Distributed tracing (OpenTelemetry, Jaeger) |
| Testing | Unit + integration | Contract + E2E | Consumer-driven contract tests (Pact) |
| Deployment | One pipeline | N pipelines | Platform engineering, GitOps, ArgoCD |
| Service Discovery | Function call | Runtime lookup | Service mesh (Istio, Linkerd), DNS |
| Auth | Session / in-process | Per-service | JWT propagation, service-to-service auth |
| Data consistency | ACID | BASE | Explicit design for eventual consistency |
| Local dev | One process | Dozens of containers | Docker Compose, service stubs, Telepresence |
The Great Architecture Shift
What the Evidence Shows
Moved from distributed microservices architecture to a monolith. Result: 90% reduction in infrastructure cost, reduced operational complexity, simpler debugging. Published as a case study by their own engineers — creating significant industry discussion.
One of the world's largest Rails monoliths — serving billions in commerce. Instead of splitting into services, they invested in deep module boundaries, component architecture, and tooling to enforce boundaries. Team autonomy achieved without distributed systems overhead.
Runs on a monolith serving hundreds of millions of monthly visitors with a small engineering team. Has written extensively about why they don't see the need for microservices at their scale given their team size and architecture quality.
Loud proponents of the "majestic monolith" — moving away from cloud microservices to a small number of well-structured server processes. Documented significant cost savings and engineering simplicity improvements.
Why the Shift is Happening
| The Promise | The Reality |
|---|---|
| "Independent deployment per service" | In practice: coordinating releases across 30 services, managing schema migrations across teams |
| "Teams can choose their own tech stack" | In practice: 8 languages, nobody can debug each other's services, hiring becomes impossible |
| "Scale individual services independently" | In practice: most services have similar load profiles; only 2–3 genuinely need different scaling |
| "Fault isolation — one service fails, others don't" | In practice: synchronous call chains mean upstream failure propagates anyway |
| "Small, focused teams with clear ownership" | In practice: 40 services owned by 8 engineers, everyone is on-call for everything |
Timeless Guiding Principles
The best architecture is the one you can change safely and quickly. Favour loose coupling, explicit interfaces, and clear ownership over any particular structural pattern.
Don't make irreversible decisions before you have to. A good architecture delays infrastructure and framework decisions, keeping options open until the cost of deferral exceeds the cost of deciding.
Every architectural change should have measurable success criteria. "More scalable" is not a metric. "P99 <200ms at 10k RPS" is. Measure before and after.
Conway's Law is real. If your team isn't structured to own a service independently, the service will become a coordination nightmare. Architecture and team design must evolve together.
Every external call will fail, every database will go down, every third-party API will have an outage. Circuit breakers, retries with backoff, fallbacks, and graceful degradation are not optional.
The best distributed systems code is the code you didn't write. Network calls are 1000× slower than in-process calls. Every service boundary is a potential failure point, a serialisation cost, and a debugging challenge.
Architecture Design Checklist
Architecture is not a single meeting — it's an iterative process. This checklist provides a structured order of operations for designing or evaluating a system's architecture, along with the most common mistakes at each step.
- Document core user journeys — what workflows must work, always
- Define non-functional requirements: availability (99.9% vs 99.999%), latency targets (P50/P95/P99), throughput (RPS), data volume
- Identify compliance and regulatory constraints (GDPR, HIPAA, SOC2)
- Clarify consistency requirements — strong consistency vs. eventual consistency, per workflow
- Establish team size and operational maturity — this constrains your pattern options
- Identify time-to-market constraints — complexity costs delivery time
Designing for imaginary scale. "We might get 10M users" is not a requirement. Ask: what does day-one traffic look like? What's the 12-month realistic projection? Architect for that, with a clear path to the next order of magnitude.
- Run an Event Storming session to map domain events and commands
- Identify aggregates — clusters of entities that change together, owned by a single service/module
- Find where concepts mean different things in different contexts (a "Customer" in billing vs. shipping)
- Map team ownership to domain concepts — each bounded context should have one owning team
- Document the context map: relationships between bounded contexts (upstream/downstream, partnership, anti-corruption layer)
Drawing service boundaries around technical concerns (AuthService, DatabaseService) rather than domain concepts. Technical boundaries couple teams; domain boundaries enable autonomy. A "UserService" that owns all user data is usually a monolith with an API in front of it.
- Apply the decision matrix: team size, scale requirements, domain clarity, operational maturity
- Default to the simplest pattern that meets current requirements — simpler is almost always better
- If choosing microservices: can each service be independently deployed, scaled, and developed by one team?
- If choosing a monolith: enforce module boundaries — don't let it become a ball of mud
- Document the reasoning for the decision — future team members need to understand why
- Explicitly note what would cause you to revisit this decision (scaling signals, team growth milestones)
Choosing microservices because a senior engineer read about them or "that's how Netflix does it." Netflix also has thousands of engineers and a decade of platform engineering investment. Your context is different.
- Assign data ownership: every table/collection has exactly one authoritative writer
- Decide on consistency model per workflow: ACID (same service) vs. eventual (cross-service via events)
- Design for data sharing: query APIs, read models, event subscriptions — never shared tables across modules
- Plan schema evolution strategy: how will you add fields, rename columns, and remove data without downtime
- Choose your database types intentionally: relational, document, time-series, search — don't default to one for everything
- Design the backup, recovery, and data retention policy — architecture decision, not an afterthought
The shared database anti-pattern in disguise: microservices with separate deployment pipelines but a shared schema. Any team can modify any table. Schema migrations require coordinating every service. This is a monolith with network calls added.
- Map each interaction: is it request/response (synchronous) or fire-and-forget / publish-subscribe (async)?
- Use synchronous calls for user-facing reads and transactional writes within a bounded context
- Use async messaging for cross-context workflows, notifications, and eventual propagation
- Define your API contract strategy: REST, gRPC, GraphQL — and how schemas are versioned
- Design for failure: every synchronous call needs a timeout, retry policy, and fallback
- If using messaging: define message schema ownership, schema registry, and evolution policy
Synchronous call chains: Service A → B → C → D to handle one user request. P99 latency = sum of all P99s. One slow or failed service brings down the whole chain. Design for autonomy: each service should be able to serve its core function even if downstream services are unavailable.
- Define the three pillars: Metrics (Prometheus/Datadog), Logs (structured, correlation IDs), Traces (OpenTelemetry)
- Propagate correlation/trace IDs across every service boundary — from HTTP header to database query
- Define SLIs (what you measure), SLOs (the targets), and SLAs (the commitments)
- Design alerting strategy: alert on symptoms (SLO burn rate), not causes (CPU > 80%)
- Build runbooks alongside the system — not after incidents
- Implement health checks, readiness probes, and circuit breakers as standard
"We'll add monitoring later." In a distributed system, "later" means "after the first major outage, while under pressure, with customers watching." The cost of instrumenting a service at build time is tiny compared to debugging a production issue with no telemetry.
- Define authentication: who are the actors (users, services, third-parties) and how do they prove identity
- Define authorisation model: RBAC, ABAC, or per-resource ACLs — and where enforcement happens
- Design service-to-service auth: mutual TLS, JWT, or API keys — never unauthenticated internal traffic
- Plan secrets management: environment variables at minimum; HashiCorp Vault / AWS Secrets Manager for production
- Network segmentation: what can talk to what — apply least privilege at the network level
- Data classification: identify PII, payment data, health data — apply appropriate encryption and access controls
Implicit trust inside the network perimeter. "Internal services don't need auth because nothing bad can get in." One compromised service, one misconfigured S3 bucket, or one insider threat breaks this assumption. Zero-trust networking is now the standard baseline.
- Define deployment targets: containers (Kubernetes, ECS), VMs, PaaS, serverless — match to team operational capability
- Design CI/CD pipeline: how code moves from commit to production, including automated tests and approvals
- Define release strategy: blue/green, canary, feature flags — how do you roll back quickly?
- Plan infrastructure as code from day one: Terraform/Pulumi, not ClickOps
- Define environment strategy: how many environments, what runs in each, how are they provisioned
- Plan for disaster recovery: RTO (how long to recover) and RPO (how much data can be lost)
Kubernetes by default. K8s is a powerful operations platform with real complexity. If your team doesn't have platform engineering capability, a managed PaaS (Railway, Render, Fly.io, ECS Fargate) delivers 90% of the benefit with 10% of the operational overhead.
- Write an ADR for every significant architectural decision (template: Context → Decision → Consequences)
- Store ADRs alongside code in version control — they evolve with the system
- Mark superseded ADRs as deprecated, not deleted — the history matters
- Document the system's C4 model: Context, Containers, Components, Code diagrams
- Define and publish API contracts (OpenAPI, AsyncAPI, Protobuf schemas)
- Review and update documentation as the system evolves — stale docs are worse than no docs
Architecture diagrams that don't match reality. Systems drift from their documented state immediately. Build living documentation: generate diagrams from code where possible (Structurizr, C4 DSL), run architecture fitness functions in CI to catch drift automatically.
- Define explicit scaling triggers: "When we reach X RPS, we'll extract Y module as a service"
- Plan for the Strangler Fig pattern: incrementally replace parts of the system without a big rewrite
- Identify the riskiest architectural assumptions and build in monitoring to detect when they break
- Establish regular architecture reviews — quarterly at minimum — with the engineering team
- Budget for technical debt paydown: not all debt is bad, but unmanaged debt compounds
The Big Rewrite. "Our codebase is a mess — let's start over with the right architecture." Rewrites take 2–3× longer than estimated, the new system acquires its own technical debt, and the business continues running on the old system the whole time. The Strangler Fig almost always beats the rewrite.