High-Level Design Handbook

📖

What is High-Level Design?

Definition, purpose, and where it fits in the SDLC

High-Level Design (HLD) is a macro-level architectural blueprint of a software system. It defines what major components exist, how they interact, and why those design choices were made — without diving into implementation specifics like class hierarchies or database schemas.

An HLD answers the question: "If this system were a city, what are the districts, highways, and utilities — and how do they connect?" It is the first design artifact produced after requirements are gathered, and it is the primary communication tool between architects, engineers, product managers, and stakeholders.

Scope

System-wide view. Focuses on major components, services, databases, and external integrations — not individual classes or functions.

Audience

Technical leads, architects, senior engineers, product managers, and DevOps. Must be readable by both technical and semi-technical stakeholders.

Timing

Created after requirements are defined and before Low-Level Design (LLD) or implementation begins. Feeds into sprint planning and effort estimation.

Core goals of an HLD

Alignment

Creates shared understanding of system structure across all teams. Reduces ambiguity, prevents "we assumed you'd handle that" conversations.

Decision Record

Documents architectural decisions and their rationale (the "why"). Crucial for onboarding future engineers and for audit/compliance purposes.

Risk Identification

Forces early thinking about scalability, single points of failure, security boundaries, and integration complexity — when changes are cheap.

Estimation Foundation

Provides the structure for breaking work into epics and stories. Teams cannot accurately estimate without understanding the system shape.

💡

HLD is a living document. It is not a one-time artifact. As the system evolves, the HLD must be updated to reflect architectural changes. Treat it like code — version-controlled, reviewed, and kept current.

⚖️

HLD vs Low-Level Design

Understanding the boundary between the two levels

Dimension	High-Level Design (HLD)	Low-Level Design (LLD)
Granularity	Services, modules, databases, external APIs	Classes, methods, algorithms, DB schemas
Who reads it	Architects, TLs, PMs, DevOps, all engineers	Implementing engineers on the specific module
Key questions	"What are the building blocks?" "How do they talk?"	"How is this class structured?" "What is the algorithm?"
Diagrams	Architecture diagrams, data-flow diagrams, deployment diagrams	Class diagrams, sequence diagrams, ER diagrams
Technology mentions	Technology choices with rationale (e.g., Kafka, PostgreSQL)	Specific library versions, exact table/column names
Produced by	System architect / tech lead	Senior / lead engineers per module
Created when	Before LLD; right after requirements finalised	After HLD is approved; before implementation
Typical length	5–20 pages with diagrams	Varies; module-specific, often 10–50 pages

✅

A useful rule of thumb: If you're writing something that a non-implementing engineer would need to read to understand the system topology — it belongs in the HLD. If it only matters when you sit down to write the code — it belongs in the LLD.

🗓️

When to Write an HLD

Triggers, scope thresholds, and SDLC positioning

Not every feature needs an HLD — but skipping one when you should write it is a common cause of expensive mid-project rework. Use this threshold guide:

✅ Write an HLD when…

Building a new service or application from scratch
Introducing a new major subsystem (e.g., a notification pipeline, payment gateway)
Adding a new external integration or third-party dependency
Changing a core architectural layer (e.g., migrating to event-driven)
The change involves 3+ engineers or 2+ teams
The effort is estimated at more than 2 weeks
Regulatory / compliance review is required

⚡ An HLD may be overkill for…

A bug fix or small feature within a single service
A UI-only change with no backend implications
Minor configuration changes (feature flags, threshold adjustments)
A single-engineer spike or proof of concept
Work within a well-understood, well-documented bounded context

HLD in the SDLC

Software Development Lifecycle — Where HLD Fits

📦

What to Include in an HLD

The mandatory and recommended sections of every HLD

1. Executive Summary

1–2 paragraphs. Why does this system exist? What problem does it solve? Who are the key stakeholders? What is out of scope?

2. Requirements Summary

Functional requirements (what the system does) and non-functional requirements — NFRs — (performance, availability, security, scalability targets). These drive every architectural decision.

3. System Architecture Diagram

The core visual. Shows all major components, services, databases, queues, and their connections. Must include external systems and clients. Label every arrow with the protocol.

4. Component Descriptions

One brief section per major component. What is its single responsibility? What does it own? What does it not own? What are its interfaces?

5. Data Flow Diagrams

How data moves through the system for the 2–3 most important use cases. Makes implicit sequencing explicit and reveals integration points early.

6. Data Storage Strategy

Which databases, caches, or object stores are used and why. Data models at the entity level (not schema level). Ownership — which service owns which data.

7. API & Integration Design

Key APIs between services — method names, payloads at a high level. External integrations and third-party dependencies. Communication patterns: REST, gRPC, event-driven, etc.

8. Non-Functional Considerations

Scalability strategy (horizontal vs vertical), caching layers, CDN usage, rate limiting, failover, circuit breakers, observability (metrics, logging, tracing).

9. Security Architecture

Authentication/authorization strategy (OAuth2, JWT, RBAC), network boundaries, secrets management, data encryption at rest and in transit, and known threat vectors.

10. Deployment Architecture

How the system is deployed — cloud provider, regions, containerisation, orchestration. CI/CD approach at a high level. IaC strategy.

11. Architectural Decision Records (ADRs)

A table or list of key decisions made, alternatives considered, and the rationale for the chosen approach. This is the most valuable long-term artefact.

12. Open Issues & Risks

Unresolved questions, known risks, assumptions made, and dependencies on external teams or decisions. Keeps stakeholders honest about unknowns.

⚠️

Avoid scope creep into LLD. The moment you start writing class names, method signatures, SQL table definitions, or specific library versions — stop. Those belong in the LLD. HLD must stay at the component/service level.

📋

Document Template

A ready-to-use HLD document structure

📄 Title & Metadata REQUIRED

Document title: [System/Feature Name] — High-Level Design
Authors, reviewers, approvers
Version, status (Draft / In Review / Approved), date
Related documents (PRD, RFC, JIRA epic link)

1. Executive Summary REQUIRED

What problem are we solving? Why now? Who are the users? What is explicitly out of scope?

2. Requirements REQUIRED

Functional: List the top 5–10 capabilities the system must have.
Non-Functional: Availability (e.g., 99.9% SLA), latency budgets (p99 < 200ms), throughput (10k RPS peak), storage volumes, data retention, compliance requirements.
Constraints: Existing tech stack, budget limits, timeline, team size.

3. System Architecture REQUIRED

Architecture diagram + brief prose explaining the design pattern chosen (monolith, microservices, event-driven, CQRS, etc.) and why.

4. Component Descriptions REQUIRED

For each major component: responsibility, inputs/outputs, tech stack choice, scaling characteristics.

5. Data Flow Diagrams REQUIRED

Cover the 2–3 most critical flows (e.g., user login, order placement, data ingestion). Show actors, components crossed, and data transformations.

6. Data Storage & Ownership REQUIRED

Database choices with rationale, data ownership by service, caching strategy, replication/backup approach.

7. API & Integration Design REQUIRED

Key service-to-service APIs, external dependencies, event contracts, and SLAs for third-party integrations.

8. Security Architecture REQUIRED

Auth/authz model, network segmentation, PII handling, secrets management, encryption strategy.

9. Non-Functional Architecture REQUIRED

Scalability approach, fault tolerance, observability (metrics/logs/traces), disaster recovery, and capacity planning.

10. Deployment Architecture RECOMMENDED

Infrastructure diagram, environments (dev/staging/prod), CI/CD pipeline overview, infrastructure-as-code approach.

11. Architectural Decision Records REQUIRED

Table: Decision | Alternatives Considered | Rationale | Date | Owner

12. Open Issues, Risks & Assumptions REQUIRED

Track unresolved questions, external dependencies, and risks with owners and resolution dates.

🗂️

Diagram Types in HLD

Which diagrams to include and when to use each

Diagram Type	Purpose	When to Include	Tools
Architecture Diagram `C4 Context / Container`	Shows major components, services, and their connections. The single most important diagram.	Always — every HLD must have one	draw.io, Lucidchart, Excalidraw, C4 model
Data Flow Diagram (DFD)	Shows how data moves through the system — inputs, processes, storage, outputs	Any system with complex data transformations or pipelines	draw.io, Miro, Lucidchart
Sequence Diagram	Shows interactions over time between components for a specific use case	Complex multi-service flows — auth, checkout, notifications	PlantUML, Mermaid, Sequence Diagram.org
Deployment Diagram	Shows physical/cloud infrastructure — VPCs, regions, pods, load balancers	Cloud or distributed deployments	draw.io, AWS Architecture Diagrams
Entity-Relationship (high level)	Shows major data entities and their relationships — not table schemas	Data-heavy systems, multi-service data ownership questions	draw.io, dbdiagram.io (conceptual level)
State Diagram	Shows lifecycle states of a key domain object (e.g., an Order)	When a domain object has complex state transitions	PlantUML, draw.io, Mermaid

🎯

Diagramming principle: Each diagram should have a single purpose and be readable in under 2 minutes. If a single diagram needs a legend with more than 8 items, split it into two diagrams. Always label arrows with the protocol (HTTP, gRPC, AMQP, etc.).

🔄

Step-by-Step: How to Create an HLD

A repeatable process from requirements to approved document

1

Understand and document requirements

Before drawing anything, extract both functional and non-functional requirements. Interview stakeholders, review the PRD, and identify the three most critical user journeys. Write down explicit capacity targets: peak RPS, storage volumes, latency SLAs, availability targets (e.g., 99.9% = 8.7 hours downtime/year). NFRs drive architecture more than functional requirements do.
2

Identify and bound major components

List every major capability the system needs (auth, product catalog, orders, payments, notifications, etc.). Group related capabilities into bounded contexts or services. Apply the Single Responsibility Principle: each component should have one clear job. Name things by what they do, not how they do it.
3

Choose the right architectural pattern

Match the pattern to the scale and team. A startup with 5 engineers doesn't need microservices. Options include: Monolith (fastest to build), Modular Monolith (structured, but single deploy), Microservices (independent scaling, team autonomy), Event-Driven (async decoupling), CQRS (read/write separation). Document the pattern chosen and the alternatives rejected.
4

Draw the architecture diagram first

Start with a whiteboard or Excalidraw. Put clients at the top, databases at the bottom, services in the middle. Draw arrows representing data flow and label each with the protocol. Use the C4 model: start at Context level (system + external users), then drill into Container level (services, databases, queues). Get this diagram reviewed before writing any prose.
5

Document data flows for critical paths

Pick the 2–3 most important use cases (usually: user registration, the primary write path, and the primary read path). Trace data from entry point through every component to final storage/response. Use sequence diagrams or numbered flow diagrams. These reveal hidden complexity and missing components early.
6

Define data ownership and storage choices

For each service, define what data it owns and what database type is appropriate. SQL for relational/transactional data, NoSQL for document/flexible-schema data, cache for ephemeral/hot data, object storage for blobs. Never share a database between services — use APIs or events instead. Document replication, backup, and recovery strategy.
7

Address non-functional requirements explicitly

For each NFR, describe the mechanism: How will you achieve 99.9% uptime? (Multi-region, health checks, auto-restart.) How will you handle 10x traffic spike? (Horizontal scaling, queue-based load leveling.) How will you detect failures? (Distributed tracing, alerting on SLOs.) NFRs without mechanisms are just wishful thinking.
8

Document architectural decisions (ADRs)

For every major decision (database choice, communication protocol, caching strategy, auth mechanism), create an ADR entry: what was decided, what alternatives were considered, and why this choice was made. ADRs are the most valuable long-term output of the HLD process because they prevent the same debates recurring six months later.
9

Conduct a structured review

Share the draft with at least: one other senior engineer, a security reviewer, a representative from operations/DevOps, and a product manager. Run a dedicated 60-minute review session. Specifically stress-test failure scenarios: "What happens if Service X is down?" "What happens if the database has a 5-second spike?" Capture all feedback and resolve before marking Approved.
10

Version-control and maintain the document

Store the HLD in a version-controlled wiki (Confluence, Notion, GitHub). Add a clear version history table. Update the document when significant architectural changes are made. Consider the HLD "stale" if it hasn't been reviewed in 6 months and the system is actively evolving.

🖼️

Diagram Examples

Architecture, data flow, and sequence diagram patterns

Pattern 1 — 3-Tier Web Architecture

Classic 3-Tier: Client → API Layer → Data Layer

Pattern 2 — Microservices with Event Bus

Event-Driven Microservices Architecture

Pattern 3 — Data Flow Diagram (User Login)

Sequence: User Authentication Flow

🛒

Full Example: E-Commerce Platform HLD

Worked example walking through each HLD section

1. Executive Summary

Build a scalable e-commerce platform supporting product browsing, cart management, checkout, and order tracking. Target audience: 500k registered users, 50k daily active users. The system must support Black Friday traffic peaks of 5,000 RPS. Out of scope: warehouse management, physical shipping logistics, returns processing.

2. Non-Functional Requirements

Requirement	Target	Notes
Availability	99.9% (max 8.7h downtime/year)	Exclude planned maintenance windows
Read latency (catalog)	p99 < 100ms	Served via CDN + cache
Write latency (checkout)	p99 < 500ms	End-to-end including payment auth
Peak throughput	5,000 RPS	Checkout: 500 RPS, Browse: 4,500 RPS
Data durability	99.999999%	Order/payment data — multi-AZ DB
RPO / RTO	RPO: 5min / RTO: 30min	For primary DB failure

3. Architecture Diagram

E-Commerce Platform — High-Level Architecture

11. Architectural Decision Records

Decision	Alternatives Considered	Rationale
Microservices architecture	Monolith, Modular Monolith	Team independence, independent scaling of hot services (Product, Order). Accepted: higher operational complexity.
PostgreSQL for transactional data	MySQL, DynamoDB, MongoDB	ACID guarantees essential for orders/payments. Team has strong Postgres expertise. RDS Multi-AZ for HA.
Kafka as event bus	RabbitMQ, AWS SQS/SNS, Redis Streams	Replay capability, guaranteed ordering, high throughput required. Needed for audit trail of order events.
Redis for caching	Memcached, DynamoDB DAX	Rich data structures, TTL support, cluster mode for horizontal scaling. Session storage + product cache.
Elasticsearch for product search	PostgreSQL full-text, Algolia, Typesense	Fuzzy search, faceted filtering, and relevance scoring at scale. Algolia ruled out due to cost at our volume.

⚠️

Common Pitfalls

The mistakes that make HLDs useless — and how to avoid them

⚠ No non-functional requirements

The HLD describes what the system does but not the numbers it must hit — no latency targets, no scale figures, no availability SLAs. Without NFRs, there's no basis for any architectural decision. You can't know whether to add a cache, a queue, or a read replica if you don't know the load.

Write NFRs first, before touching the architecture. Every section of the HLD should trace back to an NFR.

⚠ Solutioning before problem definition

The HLD opens with "We will use Kubernetes, Kafka, and a GraphQL API" before establishing the requirements. Technology choices appear before any reasoning. This signals that the architecture was chosen from familiarity rather than fitness for the problem.

State requirements and constraints first. Introduce technology choices in the ADR section with explicit rationale and alternatives considered.

⚠ The "happy path only" HLD

The document describes how the system works when everything succeeds, but never addresses failure modes: What happens if the payment service is down? What if the message queue backs up? What if the primary database fails? Systems fail — the HLD must show how the architecture handles it.

Add a dedicated "Failure Modes & Resilience" section. For each external dependency, describe the failover strategy, circuit breaker behavior, and user-facing impact.

⚠ Too much detail / LLD creep

The HLD includes database table schemas, class diagrams, specific API endpoint definitions, and library version numbers. This makes the document enormous, hard to review, outdated within weeks, and overwhelming for non-implementing stakeholders.

Stop at the component/service boundary. If you're describing methods, tables, or library configs — move it to the LLD or a separate ADR.

⚠ No security architecture

Security is not mentioned or is vaguely deferred: "Authentication will be handled." Where does token validation happen? What is the AuthN/AuthZ model? Are there network boundaries? How is PII encrypted? Security omitted from HLD means it gets bolted on during implementation — the most expensive time to add it.

Dedicate a section to: auth model, network segmentation, data classification, encryption at rest/transit, and secrets management. Get a security review before HLD approval.

⚠ Diagram without labels on arrows

The architecture diagram shows boxes connected by unlabeled arrows. It's impossible to tell whether services communicate synchronously or asynchronously, what protocol is used (HTTP? gRPC? event?), or what data flows in each direction.

Every arrow must have: (1) a protocol label, (2) a direction, (3) optionally the payload type. Use different line styles for sync vs async communications.

⚠ Never updated after initial approval

The HLD is approved, then the system is built — but when decisions change during implementation (a different database, a queue added, a service split), the HLD is never updated. New engineers read the HLD and get a misleading picture of the system. Drift between documentation and reality is a major onboarding and maintenance hazard.

Treat the HLD like code. Changes to architecture require an HLD update PR/review. Add the HLD update as a Definition of Done for architectural changes.

⚠ "Architecture by committee" — no clear ownership

The HLD is written by a committee and no single person is accountable for it. Every section is contradicted by another section. There is no clear decision-maker, so controversial choices (monolith vs microservices, SQL vs NoSQL) are left unresolved with "further discussion needed."

Assign a single DRI (Directly Responsible Individual) — usually the tech lead or architect. Others review and advise; the DRI decides and owns the document.

✅

HLD Review Checklist

Use before submitting for review and before approving

Content completeness

● Functional requirements are stated clearly
● Non-functional requirements include specific numbers (latency, throughput, availability)
● At least one architecture diagram with labeled arrows and protocols
● Every major component has a description of its responsibility
● Data flows documented for the 2–3 most critical use cases
● Data ownership is explicit — each service owns its own data
● Security architecture addresses auth, encryption, and network boundaries
● Architectural Decision Records include alternatives and rationale
● Open issues and assumptions are listed with owners
● Failure modes documented — what happens when each dependency fails
● Deployment architecture described — cloud provider, regions, environments
● Observability approach specified — metrics, logs, traces, alerting
● Cost estimation or cost considerations noted

Diagram quality

● All arrows are labeled with protocol (HTTP, gRPC, AMQP, etc.)
● External systems and third-party integrations are shown
● Data stores are included and connected to the correct services
● Sync and async communications use distinct visual styles
● A legend is provided if the diagram uses more than 4 distinct visual types
● Diagram is readable without zooming on a standard screen

Review process

● Reviewed by at least one other senior engineer outside the immediate team
● Security review completed or explicitly scheduled
● DevOps / platform team has reviewed deployment and operational concerns
● Product manager has confirmed the scope aligns with product requirements
● DBA or data engineer has reviewed data model and storage choices

🔵

Legend: ● Required — must be completed before approval. ● Recommended — strongly advised for production systems. ● Nice to have — valuable when time permits.

🔧

Tools & Resources

Diagramming tools, documentation platforms, and reference frameworks

Diagramming Tools

Tool	Best For	Notes
draw.io / diagrams.net	Architecture, deployment, flow diagrams	Free, works offline, integrates with Confluence/GitHub. Recommended default.
Excalidraw	Whiteboard-style quick diagrams, collaborative sketching	Hand-drawn aesthetic, excellent for early-stage design sessions. Free, open source.
Lucidchart	Polished enterprise diagrams, cross-team collaboration	Paid. Best import/export options, strong Jira/Confluence integration.
Mermaid	Sequence diagrams, flowcharts, state diagrams as code	Text-based — version controllable. Natively supported in GitHub Markdown, Notion, Gitlab.
PlantUML	Sequence, component, and deployment diagrams as code	More powerful than Mermaid; steeper syntax. Works well in CI pipelines.
C4 Model + Structurizr	Layered architectural views (Context → Container → Component)	The C4 methodology is the most rigorous framework for HLD diagrams. Structurizr is the official tooling.

Documentation Platforms

Platform	Use
Confluence	Enterprise wiki, deep Jira integration, template library
Notion	Flexible docs, good for smaller teams, supports diagrams via embeds
GitHub / GitLab Wikis	Version-controlled, markdown-native, Mermaid built in
Backstage	Developer portal with TechDocs for versioned architecture docs

Reference Frameworks

C4 Model

Simon Brown's 4-level framework: Context (system in the world), Container (major components), Component (inside a container), Code (class-level). HLD maps to Context + Container levels.

c4model.com

Architecture Decision Records

Michael Nygard's lightweight format for documenting decisions. Each ADR has: Title, Status, Context, Decision, Consequences. Store in /docs/adr/ in the repo.

adr.github.io

AWS Well-Architected Framework

Six pillars: Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, Sustainability. Excellent NFR checklist for cloud-hosted systems.

aws.amazon.com/architecture/well-architected

12-Factor App Methodology

Principles for building maintainable, scalable services: codebase, dependencies, config, backing services, build/release/run, processes, port binding, concurrency, disposability, dev/prod parity, logs, admin processes.

12factor.net

📐

Mermaid quick example — paste this into any Mermaid-compatible editor for an instant sequence diagram:

mermaid

sequenceDiagram
    actor User
    participant GW as API Gateway
    participant Auth as Auth Service
    participant DB as User DB
    participant Cache as Redis

    User->>GW: POST /login {email, password}
    GW->>Auth: Forward credentials
    Auth->>Cache: GET session:email
    Cache-->>Auth: MISS
    Auth->>DB: SELECT user WHERE email=?
    DB-->>Auth: {userId, passwordHash}
    Auth->>Auth: bcrypt.verify(password, hash)
    Auth->>Cache: SET session:email TTL=3600
    Auth-->>GW: {accessToken, refreshToken}
    GW-->>User: 200 OK {tokens}

High-Level DesignHandbook

What is High-Level Design?

Core goals of an HLD

HLD vs Low-Level Design

When to Write an HLD

HLD in the SDLC

What to Include in an HLD

Document Template

Diagram Types in HLD

Step-by-Step: How to Create an HLD

Diagram Examples

Pattern 1 — 3-Tier Web Architecture

Pattern 2 — Microservices with Event Bus

Pattern 3 — Data Flow Diagram (User Login)

Full Example: E-Commerce Platform HLD

1. Executive Summary

2. Non-Functional Requirements

3. Architecture Diagram

11. Architectural Decision Records

Common Pitfalls

HLD Review Checklist

Content completeness

Diagram quality

Review process

Tools & Resources

Diagramming Tools

Documentation Platforms

Reference Frameworks

High-Level Design
Handbook