Enterprise Architecture — Beginner's Guide

A practical, opinionated handbook for engineers stepping into enterprise architecture. Covers domains, technology selection, governance, common pitfalls, standards, documentation, and real-world examples.

Enterprise Architecture Solution Architecture · Governance Standards · Documentation April 2026

ℹ

Who is this for? Software engineers, tech leads, and new solution architects who are transitioning into enterprise architecture roles — or who want to collaborate more effectively with enterprise architects. No prior EA background required.

What is Enterprise Architecture?

Enterprise Architecture (EA) is the discipline of aligning an organisation's technology landscape with its business strategy. It is not just about software — it encompasses processes, people, data, infrastructure, and governance. EA provides a blueprint for how the enterprise operates today and how it should evolve.

💡

Simple definition: EA answers the question — "Given what the business needs to achieve, how should our technology, data, people, and processes be organised and connected to support it — now and in the future?"

Common EA Frameworks

Framework	Origin	Key Strength	When to Use
TOGAF	The Open Group	Comprehensive ADM lifecycle, widely recognised	Large regulated enterprises needing formal structure
Zachman	John Zachman	Classification matrix — who, what, when, where, why, how	Taxonomy and classification of artefacts
SAFe (Agile)	Scaled Agile	EA embedded into agile at scale	Agile enterprises, PI planning
Gartner EA	Gartner	Business-outcome-driven, pragmatic	Consultancy-led transformation
Informal / Lightweight	Org-specific	Speed, pragmatism	Startups scaling to enterprise, product companies

⚠

Framework ≠ Output. Adopting TOGAF or Zachman does not make you an enterprise architect. Frameworks are process guides, not substitutes for deep technical judgment and business understanding.

EA vs Solution Architecture

Dimension	Enterprise Architect	Solution Architect
Scope	Entire organisation or domain landscape	Single project or product
Time horizon	3–5+ years (strategic)	6–18 months (tactical)
Primary stakeholder	CTO, CIO, business leadership	Product owner, dev team, project manager
Output	Technology roadmaps, principles, standards	Solution design, component diagrams, ADRs
Level of detail	High-level patterns and constraints	Detailed enough to build from

Business Architecture

Business architecture maps the capabilities and processes of the organisation to technology. It ensures that every technical decision is grounded in a real business need.

Core Concepts

Business Capability: What the organisation does (e.g., "Order Management", "Customer Onboarding"). Capabilities are stable; processes and systems change.
Value Stream: End-to-end flow of activities that deliver value to a customer — from request to fulfilment.
Business Process: A defined sequence of tasks within a capability. Documented in BPMN or simple flowcharts.
Operating Model: How the organisation operates — centralised vs federated, shared services, outsourced functions.
Capability Map: A visual inventory of all capabilities grouped by domain (e.g., Finance, HR, Operations).

Business Capability Map — Example

// Level 1 Capability Map (simplified e-commerce platform)

Customer Domain
├── Customer Acquisition          (Marketing, SEO, Paid Ads)
├── Customer Management           (Profile, Preferences, CRM)
└── Customer Support              (Helpdesk, Returns, Escalation)

Order Domain
├── Product Catalogue             (Listing, Search, Inventory)
├── Order Processing              (Cart, Checkout, Payment)
└── Order Fulfilment              (Warehouse, Shipping, Tracking)

Finance Domain
├── Revenue Management            (Invoicing, Reconciliation)
├── Payments & Settlements        (PSP Integration, Refunds)
└── Financial Reporting           (P&L, Dashboards, Audit)

💡

EA golden rule: Map technology investments to capabilities, not projects. Projects come and go; capabilities evolve over decades. This prevents silo-based systems and redundant build.

Application Architecture

Application architecture defines the system portfolio — what software systems exist, how they are structured internally, and how they relate to each other.

Common Patterns

Pattern	Description	Best For
Modular Monolith	Single deployable unit with internal module boundaries	Small-to-medium teams, early-stage products
Microservices	Independent, bounded-context services	Large orgs with independent teams and domains
Event-Driven	Services communicate via events/messages	Decoupled workflows, async processing, audit trails
Serverless / FaaS	Functions triggered by events, no server management	Sporadic workloads, automation, glue code
CQRS + Event Sourcing	Separate read/write models; state rebuilt from events	Complex domains, audit, temporal queries
BFF (Backend for Frontend)	Dedicated API layer per frontend client type	Mobile + web with different data needs

Application Portfolio Principles

Rationalise before you build. Ask: does a system already exist that can be extended? Can a SaaS product cover this?
Define bounded contexts. Each system should have a clear, non-overlapping ownership boundary (Domain-Driven Design).
Avoid shared databases. Services sharing a DB are distributed monoliths — the worst of both worlds.
Track technical debt. Use a tech debt register; classify debt as: intentional, unintentional, bit rot.
Decommission proactively. Every system costs money to operate. Dead systems still generate incidents.

Data Architecture

Data architecture defines how data is created, stored, transformed, moved, and consumed across the enterprise. Poor data architecture is the single most common root cause of system complexity.

Key Concepts

Concept	Description
Master Data	The authoritative, shared definition of key entities (Customer, Product, Account). One system of record per entity.
Data Lineage	Traceability of data from origin to consumption. Critical for compliance and debugging.
Data Mesh	Domain-oriented data ownership; teams own and publish data products rather than central team.
Data Lake	Central store for raw, unstructured data. Enables analytics and ML without pre-modelling.
Data Warehouse	Structured, curated store for reporting and BI. Schema-on-write, high query performance.
Lakehouse	Combines data lake storage with warehouse query semantics (e.g., Databricks Delta, Apache Iceberg).
Data Governance	Policies, ownership, quality rules, classification, and lifecycle management of data.

⚠

Master data anti-pattern: Multiple systems each maintaining their own Customer record without synchronisation. This results in divergent customer IDs, duplicated records, and impossible-to-reconcile reporting. Establish a Master Data Management (MDM) strategy early.

Data Architecture Checklist

1
Define entities and ownership: Which system is the system of record for each key entity?
2
Data classification: PII, confidential, internal, public — every data element must be classified.
3
Define retention policies: How long is each category of data retained? Who approves exceptions?
4
Event vs state: Should systems share state (DB replication) or events (CDC, Kafka)? Events are almost always better for decoupling.
5
Encryption at rest and in transit: Default to encrypted. Document any exceptions and the accepted risk.

Technology Architecture

Technology architecture covers the infrastructure and platform layer — cloud, on-premises, networking, compute, storage, and the runtimes that applications run on.

Cloud Strategy Decisions

Strategy	Description	Trade-off
Cloud-native	Fully managed services, PaaS/FaaS first	Speed + agility vs potential lock-in
Cloud-agnostic	Containers + Kubernetes, portable workloads	Portability vs complexity and overhead
Hybrid cloud	On-prem + cloud, connected via VPN/ExpressRoute	Compliance & latency control vs operational complexity
Multi-cloud	AWS + Azure + GCP, best-of-breed per service	Resilience vs significantly higher operational burden
On-premises	Own data centres	Full control vs CapEx, slow to provision

Platform Standards Checklist

Define approved runtimes — e.g., Java 21 LTS, Python 3.12, Node 22 LTS. Discourage old/unsupported versions.
Containerise everything — Docker images as the standard deployable unit; Kubernetes or managed container platforms for orchestration.
Infrastructure as Code — Terraform, Bicep, or Pulumi. All infrastructure changes via code, reviewed, and version-controlled.
Golden paths — Provide opinionated, pre-approved templates for common workload types (API service, worker, static site). Reduce decisions, increase consistency.
Observability by default — Logs, metrics, and traces must be emitted by all services. No exception for new workloads.

Security Architecture

Security architecture must be built in, not bolted on. Every design decision has a security implication. Enterprise architects and solution architects share joint responsibility for security outcomes.

Principles

Zero Trust

Never trust, always verify. No implicit trust based on network location. Every request is authenticated and authorised.

Least Privilege

Every identity (human or machine) has only the permissions it needs — nothing more. Regular access reviews are mandatory.

Defence in Depth

Multiple overlapping security controls. Assume any single layer can be breached; layers combine to reduce blast radius.

Shift Left Security

Security checks embedded in CI/CD — SAST, DAST, dependency scanning, container image scanning — not post-deployment audits.

Threat Modelling (STRIDE)

Threat	Description	Mitigation Example
Spoofing	Impersonating another identity	Strong authentication (MFA, mTLS)
Tampering	Modifying data in transit or at rest	HTTPS, data integrity checks, signed tokens
Repudiation	Denying an action was taken	Audit logs, non-repudiation via signed events
Information Disclosure	Exposing sensitive data	Encryption, role-based field masking, DLP
Denial of Service	Making a system unavailable	Rate limiting, WAF, auto-scaling, circuit breakers
Elevation of Privilege	Gaining unauthorised elevated access	RBAC, just-in-time access, privilege boundaries

Integration Architecture

Integration architecture defines how systems communicate, share data, and collaborate. Poor integration design is the leading cause of brittle, hard-to-change systems.

Integration Patterns

Pattern	When to Use	Examples
REST API	Synchronous request-response, standard CRUD	JSON over HTTPS, OpenAPI spec
GraphQL	Flexible querying for multiple clients with different data needs	Apollo, Hasura
gRPC	High-throughput internal service-to-service, typed contracts	Protobuf, internal microservices
Message Queue	Async, decoupled, at-least-once delivery	RabbitMQ, Azure Service Bus, SQS
Event Streaming	High-volume, ordered, replayable event log	Apache Kafka, Azure Event Hubs, Kinesis
Webhooks	Push notifications for state changes to external parties	Stripe, GitHub, Salesforce callbacks
CDC (Change Data Capture)	Replicating data changes from a database to downstream systems	Debezium, AWS DMS
API Gateway	Single entry point; routing, auth, rate limiting, observability	Azure APIM, AWS API Gateway, Kong

💡

Coupling is the enemy. Prefer async, event-driven communication between domains. Only use synchronous calls when a real-time response is a genuine business requirement — not out of convenience.

Technology Selection Framework

Selecting technology is one of the most consequential decisions in enterprise architecture. The wrong choice creates years of pain; a rushed choice creates the same. Use a structured evaluation framework every time.

Selection Criteria

Criterion	Weight	Questions to Ask
Fit for purpose	High	Does it actually solve the problem? Is the use-case in its sweet spot?
Operational maturity	High	Is it production-proven at scale? What are the known failure modes?
Team capability	High	Does the team have the skills? What is the learning curve and time-to-competency?
Vendor / OSS health	High	Is the vendor stable? Is the project actively maintained? What is the bus factor?
Total Cost of Ownership	High	Licensing + ops + training + migration. Not just purchase price.
Integration complexity	Medium	How does it connect to existing systems? What standards does it support?
Security & compliance	High	Does it meet our compliance posture (SOC2, GDPR, ISO 27001)?
Scalability	Medium	Will it handle 10x growth without re-architecting?
Lock-in risk	Medium	What does exit look like? Are there open standards? Migration cost?
Community & ecosystem	Medium	Forums, documentation quality, third-party integrations, hiring market.

Evaluation Process

1
Define the problem clearly: Write a one-page problem statement before looking at any tools.
2
Identify shortlist: 2–4 options maximum. More than 4 is analysis paralysis.
3
Score against criteria: Use a weighted scoring matrix. Make weights explicit and agreed upfront.
4
Run a PoC: Build the hardest integration scenario, not the happy path. Time-box to 2–4 weeks.
5
Reference check: Talk to 2–3 organisations already running it in production at similar scale.
6
Document and socialise the decision: Write an ADR. Get ARB sign-off if the decision affects multiple teams.

Build vs Buy vs Integrate

This is one of the most repeated questions in enterprise architecture — and the answer is never obvious. Each option has different trade-offs across cost, control, speed, and risk.

Build

Custom software developed in-house or contracted.

Full control over features and data
Differentiating IP stays in-house
Highest cost and longest time-to-value
You own the operational burden forever

✓ Use when: core competitive advantage or no existing solution

Buy (SaaS/COTS)

Commercial off-the-shelf or SaaS product.

Fastest time-to-value
Vendor manages ops, updates, compliance
Risk of lock-in and feature gaps
Often requires process change to fit the tool

✓ Use when: commodity capability, low differentiation

Integrate (OSS / Platform)

Open-source or platform component, self-hosted or managed.

No licensing cost; OSS community support
You own hosting, patching, upgrades
Higher flexibility than SaaS
Requires internal expertise

✓ Use when: strong OSS ecosystem, standard tooling

⚠

Default to buy for commodity capabilities. HR, payroll, email, CRM, and ERP are not sources of competitive advantage. Building your own is a distraction that costs 10x more than the SaaS alternative. Reserve "build" for the 20% that truly differentiates your business.

Technology Radar

A Technology Radar is a living document that categorises technologies by their maturity and adoption stance within your organisation. It gives teams a clear signal on what to adopt, experiment with, hold, or avoid.

The Four Quadrants

Ring	Meaning	Action
● Adopt	Proven, recommended for most use-cases	Default choice. No escalation required.
● Trial	Worth pursuing; requires evidence collection	Use on non-critical workloads with architect oversight
● Assess	Interesting; needs investigation before commitment	PoC only. Do not use in production yet.
● Hold	Not recommended for new work; legacy only	No new projects. Migrate off existing usage over time.

Example Radar Entries

// Example Technology Radar — Platform Domain (2026)

ADOPT
  Kubernetes (container orchestration)
  Terraform (infrastructure as code)
  OpenTelemetry (observability instrumentation)
  PostgreSQL (relational data)
  React / Angular 18+ (frontend)

TRIAL
  Temporal (workflow orchestration)
  Apache Iceberg (open table format)
  HTMX (lightweight frontend interactivity)

ASSESS
  Wasm (WebAssembly for compute-intensive tasks)
  Deno (JavaScript runtime)

HOLD
  AngularJS (EOL)
  REST XML / SOAP (legacy integrations only)
  On-premises Oracle DB (for new projects)

💡

Publish your radar publicly within the organisation. Treat it as a communication tool. Update it quarterly via the ARB. Thoughtworks publishes a well-known public radar at radar.thoughtworks.com as a useful reference.

Proof of Concept (PoC)

A PoC is a time-boxed, low-fidelity experiment to reduce uncertainty before committing to a technology or approach. It is not a prototype, not an MVP, and not the start of production code.

PoC Rules

Time-box strictly: 1–4 weeks. If you cannot answer the key question in that time, the question is too broad.
Define the success criteria before starting: "We will adopt X if the PoC demonstrates Y and Z." Written down. Agreed by stakeholders.
Test the hardest scenario: The PoC must validate the most risky integration, the highest throughput requirement, or the most complex use-case — not the trivial happy path.
PoC code is throwaway: It must never become production code. Label it explicitly. Enforce this.
Document findings: Write a short findings report: what was tested, what worked, what failed, what is still unknown. This feeds the ADR.

🚫

PoC-to-production drift: The most dangerous path in software is PoC code that slips into production "temporarily". It invariably stays permanently. Enforce a hard rule: if code was written as a PoC, it is rewritten before going live.

Review Processes

Governance without review processes is decoration. Effective review processes catch architectural drift, ensure standards compliance, and build shared understanding — without becoming bureaucratic blockers.

Review Types and When to Trigger

Review Type	Trigger	Participants	Output
Architecture Review	New system, major feature, or cross-team integration	EA, Solution Architects, Tech Leads	Approved / conditional / rejected design
Design Review	Start of any non-trivial feature (>2 sprint story)	Solution Architect, Senior Engineers	Reviewed design doc, identified risks
Code Review	Every pull request	Peers + at least one senior engineer	Approved PR or change requests
Security Review	Before any public-facing feature or data-touching change	Security Architect or designated reviewer	Threat model sign-off, pen test scheduling
Post-Incident Review	After every P1/P2 incident	Incident owner, SRE, Architecture	Action items, RCA document
Quarterly Architecture Review	Calendar-driven, every quarter	ARB + Domain Architects	Updated roadmap, radar changes, debt register

Architecture Review Board (ARB)

The ARB is the governance body responsible for cross-cutting architectural decisions. It reviews proposed changes that affect multiple systems, introduce new technology, or deviate from established standards.

ARB Charter — Key Elements

Membership: Enterprise Architect (chair), Domain/Solution Architects, Principal Engineers, Security Architect, occasionally a business stakeholder.
Cadence: Standing meeting every 2 weeks. Emergency sessions for critical escalations.
Decision authority: The ARB approves, conditionally approves (with action items), or rejects proposals. Decisions are binding.
Escalation path: Teams that disagree with an ARB decision escalate to CTO. The ARB is not a committee for endless debate.
Transparency: All ARB decisions are published internally — the decision log is visible to all engineers.

What Must Go to the ARB

✅ Must Submit

Introducing a new technology not on the Adopt list
Cross-domain integrations between bounded contexts
New data stores or databases
Changes to authentication or authorisation models
Architectural patterns deviating from standards
Third-party vendor onboarding (data access)

🚫 Does NOT Need ARB

Feature development within an existing service
Library/dependency upgrades within the same major version
Internal refactoring with no external interface changes
Bug fixes and operational improvements
Infrastructure scaling (within approved patterns)
UI/UX changes

Design & Code Reviews

Architecture Design Review — What to Look For

Does the design solve the actual business problem, or is it solving an interesting engineering problem?
Are the integration points clearly defined — contracts, error handling, back-pressure, retry behaviour?
Is there a data model? Has the entity ownership question been answered?
What are the failure modes? What happens when dependency X is unavailable?
Has the security model been considered? Who can read/write this data? Is it classified?
Is the design consistent with existing patterns, or does it introduce a new pattern that needs ARB sign-off?
Is there an operational runbook concept? How will this be monitored, alerted on, and recovered?

Code Review Standards (Architecture Perspective)

Code reviews enforce team standards — not personal preference. Disagreements on style must be resolved in the standards document, not in PR comments.
Architectural concerns (boundary violations, coupling, wrong abstraction level) are blocking. Style nits are non-blocking.
Large PRs (>500 lines) indicate a design problem, not a review problem. Push back on the PR size, not the reviewer.
Every PR should include test coverage changes. A feature PR with no test changes is a red flag.

Governance Gates

Governance gates are checkpoints in the delivery lifecycle where architecture and quality standards are validated before a team proceeds. They prevent technical debt accumulation from moving too fast.

Idea / Discovery

→

Architecture Review

→

Design Review

→

Development

→

Security Review

→

Staging / UAT

→

Production

💡

Gates should be fast. A governance gate that takes more than 5 business days to clear will be bypassed by teams under pressure. Invest in making reviews efficient — templates, checklists, and pre-read documents rather than live discovery discussions.

Roles Overview

Enterprise architecture involves a hierarchy of architectural roles. Each has a distinct scope, accountability, and set of outputs.

Role	Scope	Primary Outputs	Reports To
Chief Architect / EA Lead	Entire organisation	Technology strategy, EA principles, ARB governance	CTO / CIO
Enterprise Architect	Multi-domain	Technology radar, standards, capability maps, roadmaps	Chief Architect / CTO
Domain Architect	Single business domain	Domain-level architecture, integration patterns, standards	Enterprise Architect
Solution Architect	Single project or product	Solution design, component diagrams, ADRs, technical risk log	Domain Architect / Product
Principal Engineer	Technical domain within a team	Code standards, technical spikes, architectural input on PRs	Engineering Manager
Tech Lead	Single team	Sprint-level technical decisions, code review, mentoring	Engineering Manager

Guide for Solution Architects

The solution architect is the most hands-on architectural role. You bridge strategy and execution — translating EA principles into buildable solutions, and pushing back when delivery pressures threaten architectural integrity.

Core Responsibilities

1
Understand the business problem first. Before drawing a diagram, be able to explain the problem in business terms without using any technical jargon. If you cannot, you are not ready to design a solution.
2
Design to the team's capability level. The best solution is one the team can build, operate, and evolve. A correct-but-unmaintainable design is a bad design.
3
Validate with data, not opinions. Performance estimates, scalability projections, and cost models should be backed by numbers. Run load tests. Check cloud pricing calculators.
4
Document assumptions explicitly. Every solution design rests on assumptions. Write them down. Validate them. Revisit them when they change.
5
Own the technical risk register. Identify, score (likelihood × impact), and mitigate risks proactively. Do not wait for a retrospective.
6
Be present in the team. Attend standups, participate in PR reviews, pair with developers. Architecture that is delivered only via diagrams fails.

Solution Design Document Template

## Solution Design: [Feature / System Name]
Version: 1.0  |  Author: [Name]  |  Date: [Date]
Status: Draft / Under Review / Approved

### Problem Statement
What business problem does this solve? (2–3 sentences, no jargon)

### Goals & Non-Goals
Goals:
  - [Goal 1]
  - [Goal 2]
Non-goals:
  - [What is explicitly out of scope]

### Assumptions
  - [Assumption 1] — validated / unvalidated
  - [Assumption 2]

### Architecture Overview
[High-level component diagram or C4 Level 2 Container diagram]

### Key Design Decisions
  Decision 1: [What was decided] — [Why] — [Alternatives considered]
  Decision 2: ...

### Data Model
[Entity definitions, ownership, relationships]

### Integration Points
  [System A] → [This system]: [protocol, contract, error handling]
  [This system] → [System B]: [protocol, contract, error handling]

### Security Considerations
  - Authentication: [How]
  - Authorisation: [RBAC model]
  - Data classification: [PII? Confidential?]
  - Threat model summary

### Observability
  - Key metrics to track
  - Alert thresholds
  - Runbook location

### Risks
| Risk | Likelihood | Impact | Mitigation |
|------|-----------|--------|------------|
| [Risk 1] | M | H | [Mitigation] |

### Open Questions
  - [Question] — Owner: [Name] — Due: [Date]

Collaborating with Development Teams

Architecture fails when it is done to teams rather than with them. Effective architects embed, listen, and earn trust through technical credibility.

Anti-Patterns in Architect-Team Relationships

Ivory Tower Architecture: Designing in isolation, handing down diagrams, then disappearing. Teams who feel architecture is imposed on them will route around it.
Governance-only presence: Being visible only in review meetings, not in day-to-day delivery. You lose context; teams lose trust.
Rejecting without alternatives: "We can't do X" is only useful if followed by "because Y, and here's what we can do instead: Z".

What Good Collaboration Looks Like

Architects participate in sprint planning and refinement sessions — at least fortnightly.
Architecture input is available in the team's working tools (Confluence, Notion, GitHub Discussions) — not locked in separate EA tooling.
Architects pair with developers on complex integrations and review spikes.
Feedback flows both ways: developers identify problems that improve architectural decisions.
Architects celebrate team engineering wins publicly — they are not a police force.

Common Pitfalls in Enterprise Architecture

Enterprise architecture has a well-documented set of recurring failure modes. Recognising them early is the single most valuable skill for a new practitioner.

Resume-Driven Development (RDD)

🚨 Resume-Driven Development

Selecting technologies because they look impressive on a CV, not because they are the right fit. Common symptoms: introducing Kubernetes for a team of 3, adopting Kafka for 500 events/day, rebuilding an existing system in a trendy framework "because it's what everyone is using now".

How to Detect It

Technology selection conversations focus on features and hype rather than the specific problem at hand.
The team pushing hardest for a technology has the most to gain professionally from learning it.
TCO analysis is skipped or minimised. "We'll figure out the ops later."
Alternatives are dismissed with vague arguments: "It doesn't scale" — without defining what scale means in context.

How to Counter It

Require a written problem statement before any technology discussion.
Use a weighted scoring matrix. Scores must be justified, not asserted.
Ask "what problem does this solve that our current stack cannot?" The answer must be specific.
Separate the technology evaluation from the person advocating for it. Can someone else run the PoC?

Other Common Pitfalls

⚠ Over-Engineering (YAGNI)

Building for requirements that don't exist yet. "We might need to support multiple databases" is a common justification for unnecessary abstraction layers. Design for what you know; leave extensibility points only where you have evidence of future need.

⚠ Premature Microservices

Splitting a system into microservices before domain boundaries are well-understood creates a distributed monolith — all the complexity of distribution with none of the benefits. Start with a modular monolith. Split only when a specific scaling or team-autonomy problem is well-defined.

🚨 Big Design Up Front (BDUF)

Attempting to fully specify every architectural decision before any code is written. Architecture must evolve. Spend 80% of design effort on the 20% of decisions that are genuinely hard to change (data model, integration topology, security model). Leave the rest to emerge.

🚨 Analysis Paralysis

Perpetual evaluation without a decision. If a PoC has been running for 3 months without a conclusion, the process is broken. Set decision deadlines. Embrace reversibility — many decisions are cheaper to change than to delay indefinitely.

⚠ Vendor Lock-in (Unmanaged)

Not all lock-in is bad — managed lock-in with awareness of exit cost is a valid trade-off. Unmanaged lock-in is where you discover the dependency only when you want to leave. Always document: what is the exit plan if this vendor is acquired, fails, or raises prices 10x?

💡 Governance Theatre

Review processes that exist to create the appearance of governance without providing real value. A 90-minute ARB meeting where everyone nods along and nothing is challenged is governance theatre. Good governance asks hard questions and sometimes says no.

💡 Documentation Debt

Deferring documentation indefinitely under delivery pressure. Undocumented systems are a bus-factor risk and a productivity drain. Every sprint should include documentation tasks. Architecture that can only be understood by its original author is an operational liability.

⚠ Security as an Afterthought

Adding security controls after a system is built costs 10–100x more than designing them in. Security review at end of a project almost always results in superficial compliance rather than genuine security. Architects must include security in initial design reviews.

Defining Coding Standards

Coding standards are the written agreements that define how your organisation writes, structures, and reviews code. Without them, every team reinvents conventions independently, making cross-team collaboration and maintenance costly.

What Coding Standards Cover

Area	Examples
Naming conventions	camelCase vs snake_case, file naming, class vs interface prefixes
Project structure	Directory layout, module boundaries, entry point conventions
Code style	Indentation, line length, brace style, import ordering
Language features	Banned features (e.g. `eval`), preferred patterns (e.g. async/await over callbacks)
Error handling	Do not swallow exceptions, structured logging, error boundary patterns
Testing requirements	Minimum coverage %, naming conventions for tests, mandatory test types per PR
Security rules	No secrets in code, parameterised queries only, OWASP Top 10 awareness
Documentation	When to add comments, public API documentation requirements
Git conventions	Branch naming, commit message format (Conventional Commits), PR size limits
Dependency management	Approved dependency sources, vulnerability scanning, major version pinning rules

How to Define Coding Standards — Process

1
Audit the current state: What are teams already doing? Codify existing good practices rather than inventing new ones. Standards imposed without buy-in will be ignored.
2
Involve engineers in authoring: Standards written by architects in isolation are resented. Run workshops. Use surveys. Crowdsource contentious decisions.
3
Prioritise automation over documentation: Every standard that can be enforced by a linter, formatter, or CI check should be. Documentation for what a human must consciously decide.
4
Explain the "why": Every rule should have a rationale. "We use snake_case for Python because PEP 8 is the community standard and our linting enforces it." Rules without rationale get challenged constantly.
5
Version and review annually: Standards that never change become cargo-cult religion. Review them annually with the community. Retire rules that no longer serve their purpose.
6
Distinguish enforced from advisory: Be explicit. Is this rule enforced by CI (non-negotiable) or a recommended practice (advisory)?

Standards Enforcement & Tooling

Standards that rely entirely on human diligence in code review will erode under delivery pressure. Automate enforcement wherever possible.

Enforcement Toolchain

Tool Type	Examples	Enforces
Code Formatter	Prettier, Black (Python), gofmt, dotnet-format	Style consistency — no debate in PR reviews
Linter	ESLint, Pylint, Checkstyle, SonarLint	Code quality rules, anti-patterns, complexity
Static Analysis (SAST)	SonarCloud, Semgrep, CodeQL	Security vulnerabilities, code smells, coverage
Dependency Scanner	Dependabot, Snyk, OWASP Dependency-Check	Known CVEs in third-party libraries
Secret Scanner	Gitleaks, TruffleHog, GitHub Secret Scanning	Credentials and secrets committed to repos
Container Scanner	Trivy, Grype, Snyk Container	Vulnerable base images, OS packages
Pre-commit Hooks	Husky, pre-commit framework	Local fast-fail before code reaches CI
Branch Protection	GitHub / GitLab branch rules	PR required, CI must pass, review required
Code Coverage Gate	Codecov, SonarQube coverage threshold	Minimum test coverage per PR

# Example: Pre-commit configuration (.pre-commit-config.yaml)
repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.5.0
    hooks:
      - id: trailing-whitespace
      - id: end-of-file-fixer
      - id: check-yaml
      - id: check-merge-conflict
      - id: detect-private-key          # Block committed secrets

  - repo: https://github.com/psf/black   # Python formatter
    rev: 24.3.0
    hooks:
      - id: black

  - repo: https://github.com/gitleaks/gitleaks
    rev: v8.18.2
    hooks:
      - id: gitleaks                     # Secret scanning

Living Standards

Standards are not written once and forgotten. They must evolve with the technology landscape, team growth, and lessons learned. A "living standard" is one that has a clear ownership, review cadence, and contribution process.

Principles for Living Standards

Store standards as code: In a Git repository (e.g., a standards repo or a well-structured Confluence space). All changes go through a PR/review process. No one person can unilaterally change a standard.
Changelog required: Every change to a standard must include a dated changelog entry explaining what changed and why.
Deprecation notices: Rules being removed should be deprecated first — flagged as "will be removed in 90 days" — giving teams time to adapt.
Exception process: There must be a documented process for requesting an exception to a standard. This is healthier than teams silently ignoring rules.
Annual review: Schedule a formal annual standards review. Use it to cull rules that are no longer relevant and add rules for patterns that have emerged.

Documentation Types

Different audiences need different documentation. The mistake most teams make is writing everything in one format (usually developer-focused) and assuming it serves everyone.

Documentation Type	Audience	Format	Update Frequency
Architecture Decision Records (ADRs)	Engineers, Architects	Structured markdown, version-controlled	Per decision (immutable once accepted)
System Context / C4 Diagrams	All stakeholders	Diagrams-as-code (Structurizr, Mermaid, PlantUML)	On significant change
Runbooks	SRE, Operations	Step-by-step wiki pages	After every incident or procedure change
API Documentation	Developers (consumers)	OpenAPI / AsyncAPI spec, generated portal	On API change (auto-generated from code)
Solution Design Docs	Engineers, PM, Architect	Structured document (Confluence, Notion)	Before build; updated during significant changes
Technology Radar	All engineers	HTML/PDF, visual quadrant	Quarterly
Coding Standards	Engineers	Git-hosted markdown	Annually + as needed
Data Dictionary	Engineers, Analysts, Business	Structured catalogue (e.g., DataHub, Collibra)	On data model change
Roadmaps	Business, Leadership	Visual timeline, slide deck	Quarterly

Managing Documentation

Documentation that isn't maintained is worse than no documentation — outdated docs cause costly mistakes. Managing documentation requires process, tooling, and cultural buy-in.

Key Principles

Documentation is part of the Definition of Done. A story is not complete if the documentation it requires has not been updated. This is non-negotiable.
Docs-as-Code: Where possible, store architecture docs alongside code. ADRs in /docs/decisions/, runbooks in /docs/runbooks/. Changes are reviewed in PRs.
Single source of truth: If the same information exists in three places, it will be inconsistent within a month. Pick one canonical source per document type and redirect everything else to it.
Audit quarterly: Review all documentation for staleness each quarter. Archive or delete documents that are no longer accurate. A labelled archive is better than stale live content.
Search must work: Engineers must be able to find documentation. Poor information architecture in Confluence or Notion is the most common reason documentation is not used.

Documentation Ownership Matrix

Document	Owner	Reviewer	Access
Architecture Decision Records	Solution Architect	Domain Architect	All engineers
System diagrams (C4)	Solution Architect	Tech Lead	All engineers + business
API Specs (OpenAPI)	Service team	API Gateway team	All engineers; published externally where applicable
Runbooks	SRE / Service team	Platform Engineer	Operations + On-call engineers
Coding Standards	Principal Engineer	ARB	All engineers
Technology Radar	Enterprise Architect	ARB	All engineers

Documenting Architecture & Flows

Architecture diagrams communicate structure and behaviour to different audiences. The key principle is right diagram for the right audience — not one diagram that tries to show everything.

Diagram Types

Diagram Type	Shows	Audience	Tool
Context Diagram (C4 L1)	System and its external users/systems	Business, all stakeholders	Structurizr, Mermaid
Container Diagram (C4 L2)	Major deployable units within the system	Architects, senior engineers	Structurizr, draw.io
Component Diagram (C4 L3)	Internal components of a container	Development team	Structurizr, PlantUML
Sequence Diagram	Time-ordered message flow between actors/services	Engineers, QA, architects	Mermaid, PlantUML, Lucidchart
Entity-Relationship (ERD)	Data entities and their relationships	Engineers, data architects, DBAs	dbdiagram.io, draw.io
Data Flow Diagram (DFD)	How data moves through a system	Architects, security, compliance	draw.io, Lucidchart
State Diagram	States of an entity and transition triggers	Engineers, BA, QA	Mermaid, PlantUML
Deployment Diagram	Infrastructure topology, cloud resources	Engineers, DevOps, architects	draw.io, Terraform CDK, diagrams.net

💡

Diagrams-as-code: Prefer tools like Mermaid or Structurizr where diagrams are written in text and version-controlled alongside the code. Avoid binary diagram files (Visio, OmniGraffle) that cannot be diffed, reviewed, or easily updated. Mermaid is natively rendered by GitHub, GitLab, Confluence, and Notion.

Architecture Decision Records (ADRs)

An ADR is a short document that captures a significant architectural decision — what was decided, why, what alternatives were considered, and what the consequences are. ADRs are one of the highest-value practices in enterprise architecture.

ADR Template (MADR Format)

# ADR-0042: Use Apache Kafka for Domain Event Streaming

Date: 2026-03-15
Status: Accepted
Deciders: [Names]
Tags: integration, messaging, event-driven

## Context and Problem Statement
We need to propagate domain events from the Order service to downstream
consumers (Inventory, Fulfilment, Analytics) with guaranteed ordering
within a partition and replay capability for new consumers.

## Decision Drivers
* Must support consumer replay (new consumers catching up on history)
* Ordering guarantees within an entity's event stream required
* Expected throughput: 50,000 events/day (growing to 500k in 12 months)
* Team has prior Kafka experience; internal managed Kafka cluster available

## Considered Options
1. Apache Kafka (managed internal cluster)
2. Azure Service Bus (cloud-native)
3. RabbitMQ (existing infrastructure)
4. PostgreSQL LISTEN/NOTIFY (simple, no new infrastructure)

## Decision Outcome
Chosen: Apache Kafka

Positive consequences:
* Log compaction enables efficient replay
* Partition-ordered delivery meets our ordering requirement
* Team expertise reduces ramp-up time

Negative consequences / risks:
* Operational complexity of Kafka cluster management
* Schema evolution requires Avro/Confluent Schema Registry

## Pros and Cons of Alternatives
Azure Service Bus: No replay; message TTL only. Ruled out.
RabbitMQ: No partition-ordered replay. Ruled out.
PG LISTEN/NOTIFY: Would not scale to 500k events/day. Ruled out.

ADR Best Practices

One decision per ADR. Do not bundle multiple decisions into one document.
ADRs are immutable once accepted. If the decision changes, write a new ADR that supersedes the old one. The old ADR remains as historical context.
Number sequentially. ADR-0001, ADR-0002. This makes referencing trivial and prevents conflicts.
Link from the codebase. Add ADR references in code comments or README files in the directories affected by the decision.
Store in the repository: /docs/decisions/ alongside the code it governs.

C4 Model

The C4 model (by Simon Brown) provides four levels of architecture diagram with increasing detail. Each level answers a different question and serves a different audience.

Level	Name	Question Answered	Audience
L1	System Context	What is the system and who uses it?	Everyone (non-technical)
L2	Container	What are the major deployable units and how do they communicate?	Architects, senior engineers
L3	Component	What are the internal components of a given container?	Development team
L4	Code	How is a component implemented?	Developers (auto-generated from IDE)

C4 L1 — System Context (Mermaid)

```mermaid
C4Context
  title System Context — Order Management Platform

  Person(customer, "Customer", "Places and tracks orders via web/mobile")
  Person(agent, "Support Agent", "Handles escalations and refunds")

  System(oms, "Order Management System", "Core order lifecycle — cart, checkout, fulfilment tracking")

  System_Ext(payment, "Payment Gateway", "Stripe / PayPal — payment processing")
  System_Ext(wms, "Warehouse Management System", "3PL partner — picking, packing, dispatch")
  System_Ext(notif, "Notification Service", "Email / SMS via SendGrid / Twilio")

  Rel(customer, oms, "Places orders", "HTTPS")
  Rel(agent, oms, "Manages escalations", "HTTPS")
  Rel(oms, payment, "Authorise & capture payment", "HTTPS/REST")
  Rel(oms, wms, "Dispatch fulfilment instruction", "Event/Kafka")
  Rel(oms, notif, "Trigger notifications", "Event/Kafka")
```

C4 L2 — Container Diagram (Mermaid)

```mermaid
C4Container
  title Container Diagram — Order Management System

  Person(customer, "Customer", "Web / Mobile")

  Container_Boundary(oms, "Order Management System") {
    Container(spa, "Web SPA", "React", "Customer-facing storefront")
    Container(bff, "BFF API", "Node.js / Express", "Backend for frontend — aggregates data for web client")
    Container(order_svc, "Order Service", ".NET 8", "Order lifecycle domain logic")
    Container(inventory_svc, "Inventory Service", "Python / FastAPI", "Stock levels and reservations")
    ContainerDb(order_db, "Order DB", "PostgreSQL", "Order, line-item, and payment state")
    ContainerDb(inventory_db, "Inventory DB", "PostgreSQL", "Stock levels per SKU and warehouse")
    Container(event_bus, "Event Bus", "Apache Kafka", "Domain event streaming")
  }

  System_Ext(payment, "Payment Gateway", "Stripe")
  System_Ext(wms, "WMS", "3PL Warehouse")

  Rel(customer, spa, "Uses", "HTTPS")
  Rel(spa, bff, "API calls", "JSON/REST")
  Rel(bff, order_svc, "Delegates", "gRPC")
  Rel(bff, inventory_svc, "Queries stock", "REST")
  Rel(order_svc, order_db, "Reads/writes", "SQL")
  Rel(inventory_svc, inventory_db, "Reads/writes", "SQL")
  Rel(order_svc, event_bus, "Publishes domain events", "Kafka")
  Rel(inventory_svc, event_bus, "Subscribes to order events", "Kafka")
  Rel(order_svc, payment, "Authorise payment", "HTTPS")
  Rel(event_bus, wms, "Dispatch fulfilment", "Kafka")
```

Flow & Sequence Diagrams

Flow and sequence diagrams document behaviour over time. They are indispensable for communicating integration contracts, validating error-handling logic, and onboarding engineers.

Sequence Diagram — Order Checkout Flow

```mermaid
sequenceDiagram
  autonumber
  actor C as Customer
  participant BFF as BFF API
  participant OS as Order Service
  participant IS as Inventory Service
  participant PG as Payment Gateway
  participant KB as Kafka Bus

  C->>BFF: POST /checkout { cart, payment_token }
  BFF->>IS: Reserve stock { items }
  IS-->>BFF: 200 OK { reservation_id }

  BFF->>OS: Create order { cart, reservation_id, payment_token }
  OS->>PG: Authorise payment { amount, token }
  PG-->>OS: 200 OK { auth_code }

  OS->>OS: Persist order (PENDING)
  OS->>KB: Publish OrderCreated event
  OS-->>BFF: 201 Created { order_id }
  BFF-->>C: 201 Created { order_id, estimated_delivery }

  note over KB: Downstream consumers process asynchronously
  KB--)IS: OrderCreated → confirm stock deduction
  KB--)NS: OrderCreated → send confirmation email
```

State Diagram — Order Lifecycle

```mermaid
stateDiagram-v2
  [*] --> PENDING : Order created
  PENDING --> PAYMENT_CAPTURED : Payment authorised
  PENDING --> CANCELLED : Payment declined / timeout
  PAYMENT_CAPTURED --> IN_FULFILMENT : Dispatched to WMS
  IN_FULFILMENT --> SHIPPED : Carrier collected
  SHIPPED --> DELIVERED : Delivery confirmed
  SHIPPED --> RETURN_REQUESTED : Customer requests return
  RETURN_REQUESTED --> REFUNDED : Return received + inspected
  DELIVERED --> RETURN_REQUESTED : Within return window
  CANCELLED --> [*]
  DELIVERED --> [*]
  REFUNDED --> [*]
```

💡

Diagrams-as-code tooling: Use Mermaid for quick diagrams embedded in markdown (GitHub, GitLab, Confluence). Use Structurizr (structurizr.com) for C4 models with live workspace management. Use PlantUML for detailed sequence and component diagrams requiring fine-grained control.

Enterprise Architecture Examples

The following examples illustrate how EA principles apply to common real-world scenarios.

Example 1 — E-Commerce Platform Architecture

Context

A mid-size retailer with 2M customers scaling from a monolith to a service-oriented architecture. 50 engineers across 8 product teams.

Domain	Approach	Key Decisions
Application	Modular monolith → bounded service split	Start monolith; extract services when team owns a clear domain and the monolith boundary is proven
Data	PostgreSQL per service; Kafka CDC to analytics lake	No shared DB. Each service owns its data. Analytics via Kafka CDC → Snowflake
Integration	REST APIs internally; Kafka for domain events	Synchronous for user-facing reads; async events for state propagation
Security	OAuth2 + OIDC via Keycloak; zero-trust internal network	mTLS between services; RBAC in API gateway
Platform	Kubernetes on AWS EKS; Terraform IaC	Golden path templates for service scaffolding; observability via OpenTelemetry → Grafana stack
Governance	ARB bi-weekly; ADRs in each service repo	Technology radar updated quarterly; new databases require ARB approval

Example 2 — Financial Services (Regulated)

Context

A regulated payments company operating under PCI-DSS and FCA oversight. 200 engineers. On-premises data centre + hybrid cloud.

Domain	Approach	Key Decisions
Security	Zero Trust, FIDO2/MFA enforced, HSM for key management	No lateral movement; network microsegmentation; all secrets in Vault
Data	Strict data classification; PCI scope boundary drawn around card data systems	Card data only in PCI-scoped systems; tokenisation for all external exposure
Compliance	Architecture review mandatory for PCI-scoped changes	Evidence artefacts (ADRs, threat models, scan results) linked from each change ticket
Platform	Hybrid: on-prem for card data; Azure for non-PCI workloads	Private ExpressRoute; no card data ever crosses to public cloud
Integration	Internal gRPC; external REST + mutual TLS	API Gateway at boundary with certificate pinning and DLP inspection

Example 3 — SaaS Startup Scaling

Context

A B2B SaaS company with 15 engineers growing rapidly. Currently a Rails monolith with a single PostgreSQL database. Starting to feel pain at 500k MAU.

💡

Recommended path: Do NOT rewrite as microservices. Introduce module boundaries inside the monolith first (domain-based namespaces). Extract the first bounded service only when a specific team (with clear ownership) and a specific scaling bottleneck (with data) justify it. One service extracted well is worth more than ten extracted hastily.

Pain Point	EA Response	Outcome
Database slowdown on reports	Add read replica; move analytics queries to replica	3-day effort; solved for 18 months of growth
Background job queue coupling	Extract async job processor as separate process (Sidekiq → separate container)	Isolated failure domain, independent scaling
Real-time notifications blocking web requests	Introduce WebSocket service + Redis pub/sub	Decoupled; notification latency <200ms
Third-party integrations causing timeouts	Wrap in async queue with idempotent retry	Resilient; timeouts no longer affect user experience

Reference Links

Frameworks & Standards

Books

BookFundamentals of Software Architecture — Mark Richards & Neal Ford (O'Reilly, 2020). Best starting point for architects.
BookSoftware Architecture: The Hard Parts — Richards, Ford, Sadalage, Dehghani (O'Reilly, 2021). Trade-off analysis in distributed systems.
BookBuilding Evolutionary Architectures — Ford, Parsons, Kua (O'Reilly, 2017). Architecture fitness functions and evolution.
BookClean Architecture — Robert C. Martin (Prentice Hall, 2017). Dependency rules and architectural boundaries.
BookDomain-Driven Design — Eric Evans (Addison-Wesley, 2003). Foundational for bounded contexts and ubiquitous language.
BookDesigning Data-Intensive Applications — Martin Kleppmann (O'Reilly, 2017). Essential for data architecture trade-offs.
BookThe Software Architect Elevator — Gregor Hohpe (O'Reilly, 2020). The EA's role bridging business and engineering.

Patterns & Design

Patternsmicroservices.io — Chris Richardson's pattern catalogue
PatternsEnterprise Integration Patterns — Hohpe & Woolf
DDDMartin Fowler — DDD articles
CQRSMartin Fowler — CQRS
Event SourcingMartin Fowler — Event Sourcing

Security

OWASPOWASP Top 10 — owasp.org
Threat ModelOWASP Threat Modelling Process
Zero TrustNIST SP 800-207 — Zero Trust Architecture

Enterprise Architecture — Beginner's Guide

What is Enterprise Architecture?

Common EA Frameworks

EA vs Solution Architecture

The Six Domains of Enterprise Architecture

Business Architecture

Core Concepts

Business Capability Map — Example

Application Architecture

Common Patterns

Application Portfolio Principles

Data Architecture

Key Concepts

Data Architecture Checklist

Technology Architecture

Cloud Strategy Decisions

Platform Standards Checklist

Security Architecture

Principles

Threat Modelling (STRIDE)

Integration Architecture

Integration Patterns

Technology Selection Framework

Selection Criteria

Evaluation Process

Build vs Buy vs Integrate

Technology Radar

The Four Quadrants

Example Radar Entries

Proof of Concept (PoC)

PoC Rules

Review Processes

Review Types and When to Trigger

Architecture Review Board (ARB)

ARB Charter — Key Elements

What Must Go to the ARB

Design & Code Reviews

Architecture Design Review — What to Look For

Code Review Standards (Architecture Perspective)

Governance Gates

Roles Overview

Guide for Solution Architects

Core Responsibilities

Solution Design Document Template

Collaborating with Development Teams

Anti-Patterns in Architect-Team Relationships

What Good Collaboration Looks Like

Common Pitfalls in Enterprise Architecture

Resume-Driven Development (RDD)

How to Detect It

How to Counter It

Other Common Pitfalls

Defining Coding Standards

What Coding Standards Cover

How to Define Coding Standards — Process

Standards Enforcement & Tooling

Enforcement Toolchain

Living Standards

Principles for Living Standards

Documentation Types

Managing Documentation

Key Principles

Documentation Ownership Matrix

Documenting Architecture & Flows

Diagram Types

Architecture Decision Records (ADRs)

ADR Template (MADR Format)

ADR Best Practices

C4 Model

C4 L1 — System Context (Mermaid)

C4 L2 — Container Diagram (Mermaid)

Flow & Sequence Diagrams

Sequence Diagram — Order Checkout Flow

State Diagram — Order Lifecycle

Enterprise Architecture Examples

Example 1 — E-Commerce Platform Architecture

Example 2 — Financial Services (Regulated)

Example 3 — SaaS Startup Scaling

Reference Links

Frameworks & Standards