Back to handbooks index

Enterprise Architecture — Beginner's Guide

A practical, opinionated handbook for engineers stepping into enterprise architecture. Covers domains, technology selection, governance, common pitfalls, standards, documentation, and real-world examples.

Enterprise Architecture Solution Architecture · Governance Standards · Documentation April 2026
Who is this for? Software engineers, tech leads, and new solution architects who are transitioning into enterprise architecture roles — or who want to collaborate more effectively with enterprise architects. No prior EA background required.

What is Enterprise Architecture?

Enterprise Architecture (EA) is the discipline of aligning an organisation's technology landscape with its business strategy. It is not just about software — it encompasses processes, people, data, infrastructure, and governance. EA provides a blueprint for how the enterprise operates today and how it should evolve.

💡
Simple definition: EA answers the question — "Given what the business needs to achieve, how should our technology, data, people, and processes be organised and connected to support it — now and in the future?"

Common EA Frameworks

FrameworkOriginKey StrengthWhen to Use
TOGAFThe Open GroupComprehensive ADM lifecycle, widely recognisedLarge regulated enterprises needing formal structure
ZachmanJohn ZachmanClassification matrix — who, what, when, where, why, howTaxonomy and classification of artefacts
SAFe (Agile)Scaled AgileEA embedded into agile at scaleAgile enterprises, PI planning
Gartner EAGartnerBusiness-outcome-driven, pragmaticConsultancy-led transformation
Informal / LightweightOrg-specificSpeed, pragmatismStartups scaling to enterprise, product companies
Framework ≠ Output. Adopting TOGAF or Zachman does not make you an enterprise architect. Frameworks are process guides, not substitutes for deep technical judgment and business understanding.

EA vs Solution Architecture

DimensionEnterprise ArchitectSolution Architect
ScopeEntire organisation or domain landscapeSingle project or product
Time horizon3–5+ years (strategic)6–18 months (tactical)
Primary stakeholderCTO, CIO, business leadershipProduct owner, dev team, project manager
OutputTechnology roadmaps, principles, standardsSolution design, component diagrams, ADRs
Level of detailHigh-level patterns and constraintsDetailed enough to build from

The Six Domains of Enterprise Architecture

Enterprise architecture spans six interconnected domains. Gaps in any one domain create risk, technical debt, or business misalignment. All six must be considered even if different architects own different layers.

🏢
Business Architecture
Capabilities, value streams, org structure, processes, and how technology enables business outcomes.
🧩
Application Architecture
System landscape, application portfolio, microservices, APIs, integration topology.
🗄️
Data Architecture
Data models, master data, lineage, governance, storage patterns, and analytics pipelines.
⚙️
Technology Architecture
Infrastructure, cloud platforms, networking, compute, containers, and the technology stack.
🔐
Security Architecture
Identity, access control, threat modelling, compliance, encryption, and zero-trust posture.
🔗
Integration Architecture
APIs, messaging, event streaming, ESB/iPaaS patterns, and how systems communicate.

Business Architecture

Business architecture maps the capabilities and processes of the organisation to technology. It ensures that every technical decision is grounded in a real business need.

Core Concepts

Business Capability Map — Example

// Level 1 Capability Map (simplified e-commerce platform)

Customer Domain
├── Customer Acquisition          (Marketing, SEO, Paid Ads)
├── Customer Management           (Profile, Preferences, CRM)
└── Customer Support              (Helpdesk, Returns, Escalation)

Order Domain
├── Product Catalogue             (Listing, Search, Inventory)
├── Order Processing              (Cart, Checkout, Payment)
└── Order Fulfilment              (Warehouse, Shipping, Tracking)

Finance Domain
├── Revenue Management            (Invoicing, Reconciliation)
├── Payments & Settlements        (PSP Integration, Refunds)
└── Financial Reporting           (P&L, Dashboards, Audit)
💡
EA golden rule: Map technology investments to capabilities, not projects. Projects come and go; capabilities evolve over decades. This prevents silo-based systems and redundant build.

Application Architecture

Application architecture defines the system portfolio — what software systems exist, how they are structured internally, and how they relate to each other.

Common Patterns

PatternDescriptionBest For
Modular MonolithSingle deployable unit with internal module boundariesSmall-to-medium teams, early-stage products
MicroservicesIndependent, bounded-context servicesLarge orgs with independent teams and domains
Event-DrivenServices communicate via events/messagesDecoupled workflows, async processing, audit trails
Serverless / FaaSFunctions triggered by events, no server managementSporadic workloads, automation, glue code
CQRS + Event SourcingSeparate read/write models; state rebuilt from eventsComplex domains, audit, temporal queries
BFF (Backend for Frontend)Dedicated API layer per frontend client typeMobile + web with different data needs

Application Portfolio Principles

Data Architecture

Data architecture defines how data is created, stored, transformed, moved, and consumed across the enterprise. Poor data architecture is the single most common root cause of system complexity.

Key Concepts

ConceptDescription
Master DataThe authoritative, shared definition of key entities (Customer, Product, Account). One system of record per entity.
Data LineageTraceability of data from origin to consumption. Critical for compliance and debugging.
Data MeshDomain-oriented data ownership; teams own and publish data products rather than central team.
Data LakeCentral store for raw, unstructured data. Enables analytics and ML without pre-modelling.
Data WarehouseStructured, curated store for reporting and BI. Schema-on-write, high query performance.
LakehouseCombines data lake storage with warehouse query semantics (e.g., Databricks Delta, Apache Iceberg).
Data GovernancePolicies, ownership, quality rules, classification, and lifecycle management of data.
Master data anti-pattern: Multiple systems each maintaining their own Customer record without synchronisation. This results in divergent customer IDs, duplicated records, and impossible-to-reconcile reporting. Establish a Master Data Management (MDM) strategy early.

Data Architecture Checklist

Technology Architecture

Technology architecture covers the infrastructure and platform layer — cloud, on-premises, networking, compute, storage, and the runtimes that applications run on.

Cloud Strategy Decisions

StrategyDescriptionTrade-off
Cloud-nativeFully managed services, PaaS/FaaS firstSpeed + agility vs potential lock-in
Cloud-agnosticContainers + Kubernetes, portable workloadsPortability vs complexity and overhead
Hybrid cloudOn-prem + cloud, connected via VPN/ExpressRouteCompliance & latency control vs operational complexity
Multi-cloudAWS + Azure + GCP, best-of-breed per serviceResilience vs significantly higher operational burden
On-premisesOwn data centresFull control vs CapEx, slow to provision

Platform Standards Checklist

Security Architecture

Security architecture must be built in, not bolted on. Every design decision has a security implication. Enterprise architects and solution architects share joint responsibility for security outcomes.

Principles

Zero Trust
Never trust, always verify. No implicit trust based on network location. Every request is authenticated and authorised.
Least Privilege
Every identity (human or machine) has only the permissions it needs — nothing more. Regular access reviews are mandatory.
Defence in Depth
Multiple overlapping security controls. Assume any single layer can be breached; layers combine to reduce blast radius.
Shift Left Security
Security checks embedded in CI/CD — SAST, DAST, dependency scanning, container image scanning — not post-deployment audits.

Threat Modelling (STRIDE)

ThreatDescriptionMitigation Example
SpoofingImpersonating another identityStrong authentication (MFA, mTLS)
TamperingModifying data in transit or at restHTTPS, data integrity checks, signed tokens
RepudiationDenying an action was takenAudit logs, non-repudiation via signed events
Information DisclosureExposing sensitive dataEncryption, role-based field masking, DLP
Denial of ServiceMaking a system unavailableRate limiting, WAF, auto-scaling, circuit breakers
Elevation of PrivilegeGaining unauthorised elevated accessRBAC, just-in-time access, privilege boundaries

Integration Architecture

Integration architecture defines how systems communicate, share data, and collaborate. Poor integration design is the leading cause of brittle, hard-to-change systems.

Integration Patterns

PatternWhen to UseExamples
REST APISynchronous request-response, standard CRUDJSON over HTTPS, OpenAPI spec
GraphQLFlexible querying for multiple clients with different data needsApollo, Hasura
gRPCHigh-throughput internal service-to-service, typed contractsProtobuf, internal microservices
Message QueueAsync, decoupled, at-least-once deliveryRabbitMQ, Azure Service Bus, SQS
Event StreamingHigh-volume, ordered, replayable event logApache Kafka, Azure Event Hubs, Kinesis
WebhooksPush notifications for state changes to external partiesStripe, GitHub, Salesforce callbacks
CDC (Change Data Capture)Replicating data changes from a database to downstream systemsDebezium, AWS DMS
API GatewaySingle entry point; routing, auth, rate limiting, observabilityAzure APIM, AWS API Gateway, Kong
💡
Coupling is the enemy. Prefer async, event-driven communication between domains. Only use synchronous calls when a real-time response is a genuine business requirement — not out of convenience.

Technology Selection Framework

Selecting technology is one of the most consequential decisions in enterprise architecture. The wrong choice creates years of pain; a rushed choice creates the same. Use a structured evaluation framework every time.

Selection Criteria

CriterionWeightQuestions to Ask
Fit for purposeHighDoes it actually solve the problem? Is the use-case in its sweet spot?
Operational maturityHighIs it production-proven at scale? What are the known failure modes?
Team capabilityHighDoes the team have the skills? What is the learning curve and time-to-competency?
Vendor / OSS healthHighIs the vendor stable? Is the project actively maintained? What is the bus factor?
Total Cost of OwnershipHighLicensing + ops + training + migration. Not just purchase price.
Integration complexityMediumHow does it connect to existing systems? What standards does it support?
Security & complianceHighDoes it meet our compliance posture (SOC2, GDPR, ISO 27001)?
ScalabilityMediumWill it handle 10x growth without re-architecting?
Lock-in riskMediumWhat does exit look like? Are there open standards? Migration cost?
Community & ecosystemMediumForums, documentation quality, third-party integrations, hiring market.

Evaluation Process

Build vs Buy vs Integrate

This is one of the most repeated questions in enterprise architecture — and the answer is never obvious. Each option has different trade-offs across cost, control, speed, and risk.

Build

Custom software developed in-house or contracted.

  • Full control over features and data
  • Differentiating IP stays in-house
  • Highest cost and longest time-to-value
  • You own the operational burden forever
✓ Use when: core competitive advantage or no existing solution
Buy (SaaS/COTS)

Commercial off-the-shelf or SaaS product.

  • Fastest time-to-value
  • Vendor manages ops, updates, compliance
  • Risk of lock-in and feature gaps
  • Often requires process change to fit the tool
✓ Use when: commodity capability, low differentiation
Integrate (OSS / Platform)

Open-source or platform component, self-hosted or managed.

  • No licensing cost; OSS community support
  • You own hosting, patching, upgrades
  • Higher flexibility than SaaS
  • Requires internal expertise
✓ Use when: strong OSS ecosystem, standard tooling
Default to buy for commodity capabilities. HR, payroll, email, CRM, and ERP are not sources of competitive advantage. Building your own is a distraction that costs 10x more than the SaaS alternative. Reserve "build" for the 20% that truly differentiates your business.

Technology Radar

A Technology Radar is a living document that categorises technologies by their maturity and adoption stance within your organisation. It gives teams a clear signal on what to adopt, experiment with, hold, or avoid.

The Four Quadrants

RingMeaningAction
● AdoptProven, recommended for most use-casesDefault choice. No escalation required.
● TrialWorth pursuing; requires evidence collectionUse on non-critical workloads with architect oversight
● AssessInteresting; needs investigation before commitmentPoC only. Do not use in production yet.
● HoldNot recommended for new work; legacy onlyNo new projects. Migrate off existing usage over time.

Example Radar Entries

// Example Technology Radar — Platform Domain (2026)

ADOPT
  Kubernetes (container orchestration)
  Terraform (infrastructure as code)
  OpenTelemetry (observability instrumentation)
  PostgreSQL (relational data)
  React / Angular 18+ (frontend)

TRIAL
  Temporal (workflow orchestration)
  Apache Iceberg (open table format)
  HTMX (lightweight frontend interactivity)

ASSESS
  Wasm (WebAssembly for compute-intensive tasks)
  Deno (JavaScript runtime)

HOLD
  AngularJS (EOL)
  REST XML / SOAP (legacy integrations only)
  On-premises Oracle DB (for new projects)
💡
Publish your radar publicly within the organisation. Treat it as a communication tool. Update it quarterly via the ARB. Thoughtworks publishes a well-known public radar at radar.thoughtworks.com as a useful reference.

Proof of Concept (PoC)

A PoC is a time-boxed, low-fidelity experiment to reduce uncertainty before committing to a technology or approach. It is not a prototype, not an MVP, and not the start of production code.

PoC Rules

🚫
PoC-to-production drift: The most dangerous path in software is PoC code that slips into production "temporarily". It invariably stays permanently. Enforce a hard rule: if code was written as a PoC, it is rewritten before going live.

Review Processes

Governance without review processes is decoration. Effective review processes catch architectural drift, ensure standards compliance, and build shared understanding — without becoming bureaucratic blockers.

Review Types and When to Trigger

Review TypeTriggerParticipantsOutput
Architecture ReviewNew system, major feature, or cross-team integrationEA, Solution Architects, Tech LeadsApproved / conditional / rejected design
Design ReviewStart of any non-trivial feature (>2 sprint story)Solution Architect, Senior EngineersReviewed design doc, identified risks
Code ReviewEvery pull requestPeers + at least one senior engineerApproved PR or change requests
Security ReviewBefore any public-facing feature or data-touching changeSecurity Architect or designated reviewerThreat model sign-off, pen test scheduling
Post-Incident ReviewAfter every P1/P2 incidentIncident owner, SRE, ArchitectureAction items, RCA document
Quarterly Architecture ReviewCalendar-driven, every quarterARB + Domain ArchitectsUpdated roadmap, radar changes, debt register

Architecture Review Board (ARB)

The ARB is the governance body responsible for cross-cutting architectural decisions. It reviews proposed changes that affect multiple systems, introduce new technology, or deviate from established standards.

ARB Charter — Key Elements

What Must Go to the ARB

✅ Must Submit
  • Introducing a new technology not on the Adopt list
  • Cross-domain integrations between bounded contexts
  • New data stores or databases
  • Changes to authentication or authorisation models
  • Architectural patterns deviating from standards
  • Third-party vendor onboarding (data access)
🚫 Does NOT Need ARB
  • Feature development within an existing service
  • Library/dependency upgrades within the same major version
  • Internal refactoring with no external interface changes
  • Bug fixes and operational improvements
  • Infrastructure scaling (within approved patterns)
  • UI/UX changes

Design & Code Reviews

Architecture Design Review — What to Look For

Code Review Standards (Architecture Perspective)

Governance Gates

Governance gates are checkpoints in the delivery lifecycle where architecture and quality standards are validated before a team proceeds. They prevent technical debt accumulation from moving too fast.

Idea / Discovery
Architecture Review
Design Review
Development
Security Review
Staging / UAT
Production
💡
Gates should be fast. A governance gate that takes more than 5 business days to clear will be bypassed by teams under pressure. Invest in making reviews efficient — templates, checklists, and pre-read documents rather than live discovery discussions.

Roles Overview

Enterprise architecture involves a hierarchy of architectural roles. Each has a distinct scope, accountability, and set of outputs.

RoleScopePrimary OutputsReports To
Chief Architect / EA LeadEntire organisationTechnology strategy, EA principles, ARB governanceCTO / CIO
Enterprise ArchitectMulti-domainTechnology radar, standards, capability maps, roadmapsChief Architect / CTO
Domain ArchitectSingle business domainDomain-level architecture, integration patterns, standardsEnterprise Architect
Solution ArchitectSingle project or productSolution design, component diagrams, ADRs, technical risk logDomain Architect / Product
Principal EngineerTechnical domain within a teamCode standards, technical spikes, architectural input on PRsEngineering Manager
Tech LeadSingle teamSprint-level technical decisions, code review, mentoringEngineering Manager

Guide for Solution Architects

The solution architect is the most hands-on architectural role. You bridge strategy and execution — translating EA principles into buildable solutions, and pushing back when delivery pressures threaten architectural integrity.

Core Responsibilities

Solution Design Document Template

## Solution Design: [Feature / System Name]
Version: 1.0  |  Author: [Name]  |  Date: [Date]
Status: Draft / Under Review / Approved

### Problem Statement
What business problem does this solve? (2–3 sentences, no jargon)

### Goals & Non-Goals
Goals:
  - [Goal 1]
  - [Goal 2]
Non-goals:
  - [What is explicitly out of scope]

### Assumptions
  - [Assumption 1] — validated / unvalidated
  - [Assumption 2]

### Architecture Overview
[High-level component diagram or C4 Level 2 Container diagram]

### Key Design Decisions
  Decision 1: [What was decided] — [Why] — [Alternatives considered]
  Decision 2: ...

### Data Model
[Entity definitions, ownership, relationships]

### Integration Points
  [System A] → [This system]: [protocol, contract, error handling]
  [This system] → [System B]: [protocol, contract, error handling]

### Security Considerations
  - Authentication: [How]
  - Authorisation: [RBAC model]
  - Data classification: [PII? Confidential?]
  - Threat model summary

### Observability
  - Key metrics to track
  - Alert thresholds
  - Runbook location

### Risks
| Risk | Likelihood | Impact | Mitigation |
|------|-----------|--------|------------|
| [Risk 1] | M | H | [Mitigation] |

### Open Questions
  - [Question] — Owner: [Name] — Due: [Date]

Collaborating with Development Teams

Architecture fails when it is done to teams rather than with them. Effective architects embed, listen, and earn trust through technical credibility.

Anti-Patterns in Architect-Team Relationships

What Good Collaboration Looks Like

Common Pitfalls in Enterprise Architecture

Enterprise architecture has a well-documented set of recurring failure modes. Recognising them early is the single most valuable skill for a new practitioner.

Resume-Driven Development (RDD)

🚨 Resume-Driven Development
Selecting technologies because they look impressive on a CV, not because they are the right fit. Common symptoms: introducing Kubernetes for a team of 3, adopting Kafka for 500 events/day, rebuilding an existing system in a trendy framework "because it's what everyone is using now".

How to Detect It

How to Counter It

Other Common Pitfalls

⚠ Over-Engineering (YAGNI)
Building for requirements that don't exist yet. "We might need to support multiple databases" is a common justification for unnecessary abstraction layers. Design for what you know; leave extensibility points only where you have evidence of future need.
⚠ Premature Microservices
Splitting a system into microservices before domain boundaries are well-understood creates a distributed monolith — all the complexity of distribution with none of the benefits. Start with a modular monolith. Split only when a specific scaling or team-autonomy problem is well-defined.
🚨 Big Design Up Front (BDUF)
Attempting to fully specify every architectural decision before any code is written. Architecture must evolve. Spend 80% of design effort on the 20% of decisions that are genuinely hard to change (data model, integration topology, security model). Leave the rest to emerge.
🚨 Analysis Paralysis
Perpetual evaluation without a decision. If a PoC has been running for 3 months without a conclusion, the process is broken. Set decision deadlines. Embrace reversibility — many decisions are cheaper to change than to delay indefinitely.
⚠ Vendor Lock-in (Unmanaged)
Not all lock-in is bad — managed lock-in with awareness of exit cost is a valid trade-off. Unmanaged lock-in is where you discover the dependency only when you want to leave. Always document: what is the exit plan if this vendor is acquired, fails, or raises prices 10x?
💡 Governance Theatre
Review processes that exist to create the appearance of governance without providing real value. A 90-minute ARB meeting where everyone nods along and nothing is challenged is governance theatre. Good governance asks hard questions and sometimes says no.
💡 Documentation Debt
Deferring documentation indefinitely under delivery pressure. Undocumented systems are a bus-factor risk and a productivity drain. Every sprint should include documentation tasks. Architecture that can only be understood by its original author is an operational liability.
⚠ Security as an Afterthought
Adding security controls after a system is built costs 10–100x more than designing them in. Security review at end of a project almost always results in superficial compliance rather than genuine security. Architects must include security in initial design reviews.

Defining Coding Standards

Coding standards are the written agreements that define how your organisation writes, structures, and reviews code. Without them, every team reinvents conventions independently, making cross-team collaboration and maintenance costly.

What Coding Standards Cover

AreaExamples
Naming conventionscamelCase vs snake_case, file naming, class vs interface prefixes
Project structureDirectory layout, module boundaries, entry point conventions
Code styleIndentation, line length, brace style, import ordering
Language featuresBanned features (e.g. eval), preferred patterns (e.g. async/await over callbacks)
Error handlingDo not swallow exceptions, structured logging, error boundary patterns
Testing requirementsMinimum coverage %, naming conventions for tests, mandatory test types per PR
Security rulesNo secrets in code, parameterised queries only, OWASP Top 10 awareness
DocumentationWhen to add comments, public API documentation requirements
Git conventionsBranch naming, commit message format (Conventional Commits), PR size limits
Dependency managementApproved dependency sources, vulnerability scanning, major version pinning rules

How to Define Coding Standards — Process

Standards Enforcement & Tooling

Standards that rely entirely on human diligence in code review will erode under delivery pressure. Automate enforcement wherever possible.

Enforcement Toolchain

Tool TypeExamplesEnforces
Code FormatterPrettier, Black (Python), gofmt, dotnet-formatStyle consistency — no debate in PR reviews
LinterESLint, Pylint, Checkstyle, SonarLintCode quality rules, anti-patterns, complexity
Static Analysis (SAST)SonarCloud, Semgrep, CodeQLSecurity vulnerabilities, code smells, coverage
Dependency ScannerDependabot, Snyk, OWASP Dependency-CheckKnown CVEs in third-party libraries
Secret ScannerGitleaks, TruffleHog, GitHub Secret ScanningCredentials and secrets committed to repos
Container ScannerTrivy, Grype, Snyk ContainerVulnerable base images, OS packages
Pre-commit HooksHusky, pre-commit frameworkLocal fast-fail before code reaches CI
Branch ProtectionGitHub / GitLab branch rulesPR required, CI must pass, review required
Code Coverage GateCodecov, SonarQube coverage thresholdMinimum test coverage per PR
# Example: Pre-commit configuration (.pre-commit-config.yaml)
repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.5.0
    hooks:
      - id: trailing-whitespace
      - id: end-of-file-fixer
      - id: check-yaml
      - id: check-merge-conflict
      - id: detect-private-key          # Block committed secrets

  - repo: https://github.com/psf/black   # Python formatter
    rev: 24.3.0
    hooks:
      - id: black

  - repo: https://github.com/gitleaks/gitleaks
    rev: v8.18.2
    hooks:
      - id: gitleaks                     # Secret scanning

Living Standards

Standards are not written once and forgotten. They must evolve with the technology landscape, team growth, and lessons learned. A "living standard" is one that has a clear ownership, review cadence, and contribution process.

Principles for Living Standards

Documentation Types

Different audiences need different documentation. The mistake most teams make is writing everything in one format (usually developer-focused) and assuming it serves everyone.

Documentation TypeAudienceFormatUpdate Frequency
Architecture Decision Records (ADRs)Engineers, ArchitectsStructured markdown, version-controlledPer decision (immutable once accepted)
System Context / C4 DiagramsAll stakeholdersDiagrams-as-code (Structurizr, Mermaid, PlantUML)On significant change
RunbooksSRE, OperationsStep-by-step wiki pagesAfter every incident or procedure change
API DocumentationDevelopers (consumers)OpenAPI / AsyncAPI spec, generated portalOn API change (auto-generated from code)
Solution Design DocsEngineers, PM, ArchitectStructured document (Confluence, Notion)Before build; updated during significant changes
Technology RadarAll engineersHTML/PDF, visual quadrantQuarterly
Coding StandardsEngineersGit-hosted markdownAnnually + as needed
Data DictionaryEngineers, Analysts, BusinessStructured catalogue (e.g., DataHub, Collibra)On data model change
RoadmapsBusiness, LeadershipVisual timeline, slide deckQuarterly

Managing Documentation

Documentation that isn't maintained is worse than no documentation — outdated docs cause costly mistakes. Managing documentation requires process, tooling, and cultural buy-in.

Key Principles

Documentation Ownership Matrix

DocumentOwnerReviewerAccess
Architecture Decision RecordsSolution ArchitectDomain ArchitectAll engineers
System diagrams (C4)Solution ArchitectTech LeadAll engineers + business
API Specs (OpenAPI)Service teamAPI Gateway teamAll engineers; published externally where applicable
RunbooksSRE / Service teamPlatform EngineerOperations + On-call engineers
Coding StandardsPrincipal EngineerARBAll engineers
Technology RadarEnterprise ArchitectARBAll engineers

Documenting Architecture & Flows

Architecture diagrams communicate structure and behaviour to different audiences. The key principle is right diagram for the right audience — not one diagram that tries to show everything.

Diagram Types

Diagram TypeShowsAudienceTool
Context Diagram (C4 L1)System and its external users/systemsBusiness, all stakeholdersStructurizr, Mermaid
Container Diagram (C4 L2)Major deployable units within the systemArchitects, senior engineersStructurizr, draw.io
Component Diagram (C4 L3)Internal components of a containerDevelopment teamStructurizr, PlantUML
Sequence DiagramTime-ordered message flow between actors/servicesEngineers, QA, architectsMermaid, PlantUML, Lucidchart
Entity-Relationship (ERD)Data entities and their relationshipsEngineers, data architects, DBAsdbdiagram.io, draw.io
Data Flow Diagram (DFD)How data moves through a systemArchitects, security, compliancedraw.io, Lucidchart
State DiagramStates of an entity and transition triggersEngineers, BA, QAMermaid, PlantUML
Deployment DiagramInfrastructure topology, cloud resourcesEngineers, DevOps, architectsdraw.io, Terraform CDK, diagrams.net
💡
Diagrams-as-code: Prefer tools like Mermaid or Structurizr where diagrams are written in text and version-controlled alongside the code. Avoid binary diagram files (Visio, OmniGraffle) that cannot be diffed, reviewed, or easily updated. Mermaid is natively rendered by GitHub, GitLab, Confluence, and Notion.

Architecture Decision Records (ADRs)

An ADR is a short document that captures a significant architectural decision — what was decided, why, what alternatives were considered, and what the consequences are. ADRs are one of the highest-value practices in enterprise architecture.

ADR Template (MADR Format)

# ADR-0042: Use Apache Kafka for Domain Event Streaming

Date: 2026-03-15
Status: Accepted
Deciders: [Names]
Tags: integration, messaging, event-driven

## Context and Problem Statement
We need to propagate domain events from the Order service to downstream
consumers (Inventory, Fulfilment, Analytics) with guaranteed ordering
within a partition and replay capability for new consumers.

## Decision Drivers
* Must support consumer replay (new consumers catching up on history)
* Ordering guarantees within an entity's event stream required
* Expected throughput: 50,000 events/day (growing to 500k in 12 months)
* Team has prior Kafka experience; internal managed Kafka cluster available

## Considered Options
1. Apache Kafka (managed internal cluster)
2. Azure Service Bus (cloud-native)
3. RabbitMQ (existing infrastructure)
4. PostgreSQL LISTEN/NOTIFY (simple, no new infrastructure)

## Decision Outcome
Chosen: Apache Kafka

Positive consequences:
* Log compaction enables efficient replay
* Partition-ordered delivery meets our ordering requirement
* Team expertise reduces ramp-up time

Negative consequences / risks:
* Operational complexity of Kafka cluster management
* Schema evolution requires Avro/Confluent Schema Registry

## Pros and Cons of Alternatives
Azure Service Bus: No replay; message TTL only. Ruled out.
RabbitMQ: No partition-ordered replay. Ruled out.
PG LISTEN/NOTIFY: Would not scale to 500k events/day. Ruled out.

ADR Best Practices

C4 Model

The C4 model (by Simon Brown) provides four levels of architecture diagram with increasing detail. Each level answers a different question and serves a different audience.

LevelNameQuestion AnsweredAudience
L1System ContextWhat is the system and who uses it?Everyone (non-technical)
L2ContainerWhat are the major deployable units and how do they communicate?Architects, senior engineers
L3ComponentWhat are the internal components of a given container?Development team
L4CodeHow is a component implemented?Developers (auto-generated from IDE)

C4 L1 — System Context (Mermaid)

```mermaid
C4Context
  title System Context — Order Management Platform

  Person(customer, "Customer", "Places and tracks orders via web/mobile")
  Person(agent, "Support Agent", "Handles escalations and refunds")

  System(oms, "Order Management System", "Core order lifecycle — cart, checkout, fulfilment tracking")

  System_Ext(payment, "Payment Gateway", "Stripe / PayPal — payment processing")
  System_Ext(wms, "Warehouse Management System", "3PL partner — picking, packing, dispatch")
  System_Ext(notif, "Notification Service", "Email / SMS via SendGrid / Twilio")

  Rel(customer, oms, "Places orders", "HTTPS")
  Rel(agent, oms, "Manages escalations", "HTTPS")
  Rel(oms, payment, "Authorise & capture payment", "HTTPS/REST")
  Rel(oms, wms, "Dispatch fulfilment instruction", "Event/Kafka")
  Rel(oms, notif, "Trigger notifications", "Event/Kafka")
```

C4 L2 — Container Diagram (Mermaid)

```mermaid
C4Container
  title Container Diagram — Order Management System

  Person(customer, "Customer", "Web / Mobile")

  Container_Boundary(oms, "Order Management System") {
    Container(spa, "Web SPA", "React", "Customer-facing storefront")
    Container(bff, "BFF API", "Node.js / Express", "Backend for frontend — aggregates data for web client")
    Container(order_svc, "Order Service", ".NET 8", "Order lifecycle domain logic")
    Container(inventory_svc, "Inventory Service", "Python / FastAPI", "Stock levels and reservations")
    ContainerDb(order_db, "Order DB", "PostgreSQL", "Order, line-item, and payment state")
    ContainerDb(inventory_db, "Inventory DB", "PostgreSQL", "Stock levels per SKU and warehouse")
    Container(event_bus, "Event Bus", "Apache Kafka", "Domain event streaming")
  }

  System_Ext(payment, "Payment Gateway", "Stripe")
  System_Ext(wms, "WMS", "3PL Warehouse")

  Rel(customer, spa, "Uses", "HTTPS")
  Rel(spa, bff, "API calls", "JSON/REST")
  Rel(bff, order_svc, "Delegates", "gRPC")
  Rel(bff, inventory_svc, "Queries stock", "REST")
  Rel(order_svc, order_db, "Reads/writes", "SQL")
  Rel(inventory_svc, inventory_db, "Reads/writes", "SQL")
  Rel(order_svc, event_bus, "Publishes domain events", "Kafka")
  Rel(inventory_svc, event_bus, "Subscribes to order events", "Kafka")
  Rel(order_svc, payment, "Authorise payment", "HTTPS")
  Rel(event_bus, wms, "Dispatch fulfilment", "Kafka")
```

Flow & Sequence Diagrams

Flow and sequence diagrams document behaviour over time. They are indispensable for communicating integration contracts, validating error-handling logic, and onboarding engineers.

Sequence Diagram — Order Checkout Flow

```mermaid
sequenceDiagram
  autonumber
  actor C as Customer
  participant BFF as BFF API
  participant OS as Order Service
  participant IS as Inventory Service
  participant PG as Payment Gateway
  participant KB as Kafka Bus

  C->>BFF: POST /checkout { cart, payment_token }
  BFF->>IS: Reserve stock { items }
  IS-->>BFF: 200 OK { reservation_id }

  BFF->>OS: Create order { cart, reservation_id, payment_token }
  OS->>PG: Authorise payment { amount, token }
  PG-->>OS: 200 OK { auth_code }

  OS->>OS: Persist order (PENDING)
  OS->>KB: Publish OrderCreated event
  OS-->>BFF: 201 Created { order_id }
  BFF-->>C: 201 Created { order_id, estimated_delivery }

  note over KB: Downstream consumers process asynchronously
  KB--)IS: OrderCreated → confirm stock deduction
  KB--)NS: OrderCreated → send confirmation email
```

State Diagram — Order Lifecycle

```mermaid
stateDiagram-v2
  [*] --> PENDING : Order created
  PENDING --> PAYMENT_CAPTURED : Payment authorised
  PENDING --> CANCELLED : Payment declined / timeout
  PAYMENT_CAPTURED --> IN_FULFILMENT : Dispatched to WMS
  IN_FULFILMENT --> SHIPPED : Carrier collected
  SHIPPED --> DELIVERED : Delivery confirmed
  SHIPPED --> RETURN_REQUESTED : Customer requests return
  RETURN_REQUESTED --> REFUNDED : Return received + inspected
  DELIVERED --> RETURN_REQUESTED : Within return window
  CANCELLED --> [*]
  DELIVERED --> [*]
  REFUNDED --> [*]
```
💡
Diagrams-as-code tooling: Use Mermaid for quick diagrams embedded in markdown (GitHub, GitLab, Confluence). Use Structurizr (structurizr.com) for C4 models with live workspace management. Use PlantUML for detailed sequence and component diagrams requiring fine-grained control.

Enterprise Architecture Examples

The following examples illustrate how EA principles apply to common real-world scenarios.

Example 1 — E-Commerce Platform Architecture

Context

A mid-size retailer with 2M customers scaling from a monolith to a service-oriented architecture. 50 engineers across 8 product teams.

DomainApproachKey Decisions
ApplicationModular monolith → bounded service splitStart monolith; extract services when team owns a clear domain and the monolith boundary is proven
DataPostgreSQL per service; Kafka CDC to analytics lakeNo shared DB. Each service owns its data. Analytics via Kafka CDC → Snowflake
IntegrationREST APIs internally; Kafka for domain eventsSynchronous for user-facing reads; async events for state propagation
SecurityOAuth2 + OIDC via Keycloak; zero-trust internal networkmTLS between services; RBAC in API gateway
PlatformKubernetes on AWS EKS; Terraform IaCGolden path templates for service scaffolding; observability via OpenTelemetry → Grafana stack
GovernanceARB bi-weekly; ADRs in each service repoTechnology radar updated quarterly; new databases require ARB approval

Example 2 — Financial Services (Regulated)

Context

A regulated payments company operating under PCI-DSS and FCA oversight. 200 engineers. On-premises data centre + hybrid cloud.

DomainApproachKey Decisions
SecurityZero Trust, FIDO2/MFA enforced, HSM for key managementNo lateral movement; network microsegmentation; all secrets in Vault
DataStrict data classification; PCI scope boundary drawn around card data systemsCard data only in PCI-scoped systems; tokenisation for all external exposure
ComplianceArchitecture review mandatory for PCI-scoped changesEvidence artefacts (ADRs, threat models, scan results) linked from each change ticket
PlatformHybrid: on-prem for card data; Azure for non-PCI workloadsPrivate ExpressRoute; no card data ever crosses to public cloud
IntegrationInternal gRPC; external REST + mutual TLSAPI Gateway at boundary with certificate pinning and DLP inspection

Example 3 — SaaS Startup Scaling

Context

A B2B SaaS company with 15 engineers growing rapidly. Currently a Rails monolith with a single PostgreSQL database. Starting to feel pain at 500k MAU.

💡
Recommended path: Do NOT rewrite as microservices. Introduce module boundaries inside the monolith first (domain-based namespaces). Extract the first bounded service only when a specific team (with clear ownership) and a specific scaling bottleneck (with data) justify it. One service extracted well is worth more than ten extracted hastily.
Pain PointEA ResponseOutcome
Database slowdown on reportsAdd read replica; move analytics queries to replica3-day effort; solved for 18 months of growth
Background job queue couplingExtract async job processor as separate process (Sidekiq → separate container)Isolated failure domain, independent scaling
Real-time notifications blocking web requestsIntroduce WebSocket service + Redis pub/subDecoupled; notification latency <200ms
Third-party integrations causing timeoutsWrap in async queue with idempotent retryResilient; timeouts no longer affect user experience

Reference Links

Frameworks & Standards

Books

Patterns & Design

Security

Tools

Conferences & Communities