??????? High-Level Design Handbook
V1
Architecture System Design Distributed Systems
Software Architecture

High-Level Design
Handbook

A comprehensive reference for engineers and architects — covering what HLD is, what it must contain, how to create one, common pitfalls, and real worked examples with diagrams.

System Architecture Component Design Data Flow Scalability API Design Non-Functional Reqs
📖

What is High-Level Design?

Definition, purpose, and where it fits in the SDLC

High-Level Design (HLD) is a macro-level architectural blueprint of a software system. It defines what major components exist, how they interact, and why those design choices were made — without diving into implementation specifics like class hierarchies or database schemas.

An HLD answers the question: "If this system were a city, what are the districts, highways, and utilities — and how do they connect?" It is the first design artifact produced after requirements are gathered, and it is the primary communication tool between architects, engineers, product managers, and stakeholders.

Scope

System-wide view. Focuses on major components, services, databases, and external integrations — not individual classes or functions.

Audience

Technical leads, architects, senior engineers, product managers, and DevOps. Must be readable by both technical and semi-technical stakeholders.

Timing

Created after requirements are defined and before Low-Level Design (LLD) or implementation begins. Feeds into sprint planning and effort estimation.

Core goals of an HLD

Alignment

Creates shared understanding of system structure across all teams. Reduces ambiguity, prevents "we assumed you'd handle that" conversations.

Decision Record

Documents architectural decisions and their rationale (the "why"). Crucial for onboarding future engineers and for audit/compliance purposes.

Risk Identification

Forces early thinking about scalability, single points of failure, security boundaries, and integration complexity — when changes are cheap.

Estimation Foundation

Provides the structure for breaking work into epics and stories. Teams cannot accurately estimate without understanding the system shape.

💡
HLD is a living document. It is not a one-time artifact. As the system evolves, the HLD must be updated to reflect architectural changes. Treat it like code — version-controlled, reviewed, and kept current.
⚖️

HLD vs Low-Level Design

Understanding the boundary between the two levels
Dimension High-Level Design (HLD) Low-Level Design (LLD)
Granularity Services, modules, databases, external APIs Classes, methods, algorithms, DB schemas
Who reads it Architects, TLs, PMs, DevOps, all engineers Implementing engineers on the specific module
Key questions "What are the building blocks?" "How do they talk?" "How is this class structured?" "What is the algorithm?"
Diagrams Architecture diagrams, data-flow diagrams, deployment diagrams Class diagrams, sequence diagrams, ER diagrams
Technology mentions Technology choices with rationale (e.g., Kafka, PostgreSQL) Specific library versions, exact table/column names
Produced by System architect / tech lead Senior / lead engineers per module
Created when Before LLD; right after requirements finalised After HLD is approved; before implementation
Typical length 5–20 pages with diagrams Varies; module-specific, often 10–50 pages
A useful rule of thumb: If you're writing something that a non-implementing engineer would need to read to understand the system topology — it belongs in the HLD. If it only matters when you sit down to write the code — it belongs in the LLD.
🗓️

When to Write an HLD

Triggers, scope thresholds, and SDLC positioning

Not every feature needs an HLD — but skipping one when you should write it is a common cause of expensive mid-project rework. Use this threshold guide:

✅ Write an HLD when…
  • Building a new service or application from scratch
  • Introducing a new major subsystem (e.g., a notification pipeline, payment gateway)
  • Adding a new external integration or third-party dependency
  • Changing a core architectural layer (e.g., migrating to event-driven)
  • The change involves 3+ engineers or 2+ teams
  • The effort is estimated at more than 2 weeks
  • Regulatory / compliance review is required
⚡ An HLD may be overkill for…
  • A bug fix or small feature within a single service
  • A UI-only change with no backend implications
  • Minor configuration changes (feature flags, threshold adjustments)
  • A single-engineer spike or proof of concept
  • Work within a well-understood, well-documented bounded context

HLD in the SDLC

Software Development Lifecycle — Where HLD Fits
Requirements Gathering HLD High-Level Design ← you are here LLD Low-Level Design Implementation & Testing Deployment & Monitoring Maintain & Iterate
📦

What to Include in an HLD

The mandatory and recommended sections of every HLD
1. Executive Summary

1–2 paragraphs. Why does this system exist? What problem does it solve? Who are the key stakeholders? What is out of scope?

2. Requirements Summary

Functional requirements (what the system does) and non-functional requirements — NFRs — (performance, availability, security, scalability targets). These drive every architectural decision.

3. System Architecture Diagram

The core visual. Shows all major components, services, databases, queues, and their connections. Must include external systems and clients. Label every arrow with the protocol.

4. Component Descriptions

One brief section per major component. What is its single responsibility? What does it own? What does it not own? What are its interfaces?

5. Data Flow Diagrams

How data moves through the system for the 2–3 most important use cases. Makes implicit sequencing explicit and reveals integration points early.

6. Data Storage Strategy

Which databases, caches, or object stores are used and why. Data models at the entity level (not schema level). Ownership — which service owns which data.

7. API & Integration Design

Key APIs between services — method names, payloads at a high level. External integrations and third-party dependencies. Communication patterns: REST, gRPC, event-driven, etc.

8. Non-Functional Considerations

Scalability strategy (horizontal vs vertical), caching layers, CDN usage, rate limiting, failover, circuit breakers, observability (metrics, logging, tracing).

9. Security Architecture

Authentication/authorization strategy (OAuth2, JWT, RBAC), network boundaries, secrets management, data encryption at rest and in transit, and known threat vectors.

10. Deployment Architecture

How the system is deployed — cloud provider, regions, containerisation, orchestration. CI/CD approach at a high level. IaC strategy.

11. Architectural Decision Records (ADRs)

A table or list of key decisions made, alternatives considered, and the rationale for the chosen approach. This is the most valuable long-term artefact.

12. Open Issues & Risks

Unresolved questions, known risks, assumptions made, and dependencies on external teams or decisions. Keeps stakeholders honest about unknowns.

⚠️
Avoid scope creep into LLD. The moment you start writing class names, method signatures, SQL table definitions, or specific library versions — stop. Those belong in the LLD. HLD must stay at the component/service level.
📋

Document Template

A ready-to-use HLD document structure
📄 Title & Metadata REQUIRED
  • Document title: [System/Feature Name] — High-Level Design
  • Authors, reviewers, approvers
  • Version, status (Draft / In Review / Approved), date
  • Related documents (PRD, RFC, JIRA epic link)
1. Executive Summary REQUIRED
What problem are we solving? Why now? Who are the users? What is explicitly out of scope?
2. Requirements REQUIRED
  • Functional: List the top 5–10 capabilities the system must have.
  • Non-Functional: Availability (e.g., 99.9% SLA), latency budgets (p99 < 200ms), throughput (10k RPS peak), storage volumes, data retention, compliance requirements.
  • Constraints: Existing tech stack, budget limits, timeline, team size.
3. System Architecture REQUIRED
Architecture diagram + brief prose explaining the design pattern chosen (monolith, microservices, event-driven, CQRS, etc.) and why.
4. Component Descriptions REQUIRED
For each major component: responsibility, inputs/outputs, tech stack choice, scaling characteristics.
5. Data Flow Diagrams REQUIRED
Cover the 2–3 most critical flows (e.g., user login, order placement, data ingestion). Show actors, components crossed, and data transformations.
6. Data Storage & Ownership REQUIRED
Database choices with rationale, data ownership by service, caching strategy, replication/backup approach.
7. API & Integration Design REQUIRED
Key service-to-service APIs, external dependencies, event contracts, and SLAs for third-party integrations.
8. Security Architecture REQUIRED
Auth/authz model, network segmentation, PII handling, secrets management, encryption strategy.
9. Non-Functional Architecture REQUIRED
Scalability approach, fault tolerance, observability (metrics/logs/traces), disaster recovery, and capacity planning.
10. Deployment Architecture RECOMMENDED
Infrastructure diagram, environments (dev/staging/prod), CI/CD pipeline overview, infrastructure-as-code approach.
11. Architectural Decision Records REQUIRED
Table: Decision | Alternatives Considered | Rationale | Date | Owner
12. Open Issues, Risks & Assumptions REQUIRED
Track unresolved questions, external dependencies, and risks with owners and resolution dates.
🗂️

Diagram Types in HLD

Which diagrams to include and when to use each
Diagram TypePurposeWhen to IncludeTools
Architecture Diagram
C4 Context / Container
Shows major components, services, and their connections. The single most important diagram. Always — every HLD must have one draw.io, Lucidchart, Excalidraw, C4 model
Data Flow Diagram (DFD) Shows how data moves through the system — inputs, processes, storage, outputs Any system with complex data transformations or pipelines draw.io, Miro, Lucidchart
Sequence Diagram Shows interactions over time between components for a specific use case Complex multi-service flows — auth, checkout, notifications PlantUML, Mermaid, Sequence Diagram.org
Deployment Diagram Shows physical/cloud infrastructure — VPCs, regions, pods, load balancers Cloud or distributed deployments draw.io, AWS Architecture Diagrams
Entity-Relationship (high level) Shows major data entities and their relationships — not table schemas Data-heavy systems, multi-service data ownership questions draw.io, dbdiagram.io (conceptual level)
State Diagram Shows lifecycle states of a key domain object (e.g., an Order) When a domain object has complex state transitions PlantUML, draw.io, Mermaid
🎯
Diagramming principle: Each diagram should have a single purpose and be readable in under 2 minutes. If a single diagram needs a legend with more than 8 items, split it into two diagrams. Always label arrows with the protocol (HTTP, gRPC, AMQP, etc.).
🔄

Step-by-Step: How to Create an HLD

A repeatable process from requirements to approved document
  1. 1
    Understand and document requirements
    Before drawing anything, extract both functional and non-functional requirements. Interview stakeholders, review the PRD, and identify the three most critical user journeys. Write down explicit capacity targets: peak RPS, storage volumes, latency SLAs, availability targets (e.g., 99.9% = 8.7 hours downtime/year). NFRs drive architecture more than functional requirements do.
  2. 2
    Identify and bound major components
    List every major capability the system needs (auth, product catalog, orders, payments, notifications, etc.). Group related capabilities into bounded contexts or services. Apply the Single Responsibility Principle: each component should have one clear job. Name things by what they do, not how they do it.
  3. 3
    Choose the right architectural pattern
    Match the pattern to the scale and team. A startup with 5 engineers doesn't need microservices. Options include: Monolith (fastest to build), Modular Monolith (structured, but single deploy), Microservices (independent scaling, team autonomy), Event-Driven (async decoupling), CQRS (read/write separation). Document the pattern chosen and the alternatives rejected.
  4. 4
    Draw the architecture diagram first
    Start with a whiteboard or Excalidraw. Put clients at the top, databases at the bottom, services in the middle. Draw arrows representing data flow and label each with the protocol. Use the C4 model: start at Context level (system + external users), then drill into Container level (services, databases, queues). Get this diagram reviewed before writing any prose.
  5. 5
    Document data flows for critical paths
    Pick the 2–3 most important use cases (usually: user registration, the primary write path, and the primary read path). Trace data from entry point through every component to final storage/response. Use sequence diagrams or numbered flow diagrams. These reveal hidden complexity and missing components early.
  6. 6
    Define data ownership and storage choices
    For each service, define what data it owns and what database type is appropriate. SQL for relational/transactional data, NoSQL for document/flexible-schema data, cache for ephemeral/hot data, object storage for blobs. Never share a database between services — use APIs or events instead. Document replication, backup, and recovery strategy.
  7. 7
    Address non-functional requirements explicitly
    For each NFR, describe the mechanism: How will you achieve 99.9% uptime? (Multi-region, health checks, auto-restart.) How will you handle 10x traffic spike? (Horizontal scaling, queue-based load leveling.) How will you detect failures? (Distributed tracing, alerting on SLOs.) NFRs without mechanisms are just wishful thinking.
  8. 8
    Document architectural decisions (ADRs)
    For every major decision (database choice, communication protocol, caching strategy, auth mechanism), create an ADR entry: what was decided, what alternatives were considered, and why this choice was made. ADRs are the most valuable long-term output of the HLD process because they prevent the same debates recurring six months later.
  9. 9
    Conduct a structured review
    Share the draft with at least: one other senior engineer, a security reviewer, a representative from operations/DevOps, and a product manager. Run a dedicated 60-minute review session. Specifically stress-test failure scenarios: "What happens if Service X is down?" "What happens if the database has a 5-second spike?" Capture all feedback and resolve before marking Approved.
  10. 10
    Version-control and maintain the document
    Store the HLD in a version-controlled wiki (Confluence, Notion, GitHub). Add a clear version history table. Update the document when significant architectural changes are made. Consider the HLD "stale" if it hasn't been reviewed in 6 months and the system is actively evolving.
🖼️

Diagram Examples

Architecture, data flow, and sequence diagram patterns

Pattern 1 — 3-Tier Web Architecture

Classic 3-Tier: Client → API Layer → Data Layer
CLIENT API LAYER DATA LAYER Web App React / Vue Mobile App iOS / Android 3rd Party Webhooks / API CDN CloudFront / Fastly API Gateway Auth · Rate Limit · Route App Server Business Logic Message Queue Kafka / RabbitMQ Primary DB PostgreSQL / MySQL Cache Redis / Memcached Object Storage S3 / GCS / Blob HTTPS REST AMQP

Pattern 2 — Microservices with Event Bus

Event-Driven Microservices Architecture
Client Web / Mobile API Gateway Auth / Route HTTPS User Service Auth · Profile Order Service Cart · Checkout Product Service Catalog · Inventory Event Bus Kafka / EventBridge Notification Email / SMS / Push Analytics ClickStream · BI Payment Billing · Invoicing publish subscribe

Pattern 3 — Data Flow Diagram (User Login)

Sequence: User Authentication Flow
Client API Gateway Auth Service User DB Cache 1. POST /login {email, password} 2. Forward credentials 3. GET session:{email} 4. MISS 5. SELECT user WHERE email=? 6. user record + hash 7. SET session:{email} TTL=3600 8. 200 OK { access_token, refresh_token }
🛒

Full Example: E-Commerce Platform HLD

Worked example walking through each HLD section

1. Executive Summary

Build a scalable e-commerce platform supporting product browsing, cart management, checkout, and order tracking. Target audience: 500k registered users, 50k daily active users. The system must support Black Friday traffic peaks of 5,000 RPS. Out of scope: warehouse management, physical shipping logistics, returns processing.

2. Non-Functional Requirements

RequirementTargetNotes
Availability99.9% (max 8.7h downtime/year)Exclude planned maintenance windows
Read latency (catalog)p99 < 100msServed via CDN + cache
Write latency (checkout)p99 < 500msEnd-to-end including payment auth
Peak throughput5,000 RPSCheckout: 500 RPS, Browse: 4,500 RPS
Data durability99.999999%Order/payment data — multi-AZ DB
RPO / RTORPO: 5min / RTO: 30minFor primary DB failure

3. Architecture Diagram

E-Commerce Platform — High-Level Architecture
Web Browser React SPA Mobile App React Native CDN / WAF CloudFront + Shield API Gateway Auth · Rate Limit · Route User Service Auth · Profile Product Service Catalog · Search Order Service Cart · Checkout Payment Service Stripe / Braintree Notification Svc Email · SMS · Push Event Bus (Kafka) order.placed · payment.completed · user.registered Users DB PostgreSQL Products DB PostgreSQL Orders DB PostgreSQL Cache Redis Cluster Search Index Elasticsearch --- write-through

11. Architectural Decision Records

DecisionAlternatives ConsideredRationale
Microservices architecture Monolith, Modular Monolith Team independence, independent scaling of hot services (Product, Order). Accepted: higher operational complexity.
PostgreSQL for transactional data MySQL, DynamoDB, MongoDB ACID guarantees essential for orders/payments. Team has strong Postgres expertise. RDS Multi-AZ for HA.
Kafka as event bus RabbitMQ, AWS SQS/SNS, Redis Streams Replay capability, guaranteed ordering, high throughput required. Needed for audit trail of order events.
Redis for caching Memcached, DynamoDB DAX Rich data structures, TTL support, cluster mode for horizontal scaling. Session storage + product cache.
Elasticsearch for product search PostgreSQL full-text, Algolia, Typesense Fuzzy search, faceted filtering, and relevance scoring at scale. Algolia ruled out due to cost at our volume.
⚠️

Common Pitfalls

The mistakes that make HLDs useless — and how to avoid them
⚠ No non-functional requirements
The HLD describes what the system does but not the numbers it must hit — no latency targets, no scale figures, no availability SLAs. Without NFRs, there's no basis for any architectural decision. You can't know whether to add a cache, a queue, or a read replica if you don't know the load.
Write NFRs first, before touching the architecture. Every section of the HLD should trace back to an NFR.
⚠ Solutioning before problem definition
The HLD opens with "We will use Kubernetes, Kafka, and a GraphQL API" before establishing the requirements. Technology choices appear before any reasoning. This signals that the architecture was chosen from familiarity rather than fitness for the problem.
State requirements and constraints first. Introduce technology choices in the ADR section with explicit rationale and alternatives considered.
⚠ The "happy path only" HLD
The document describes how the system works when everything succeeds, but never addresses failure modes: What happens if the payment service is down? What if the message queue backs up? What if the primary database fails? Systems fail — the HLD must show how the architecture handles it.
Add a dedicated "Failure Modes & Resilience" section. For each external dependency, describe the failover strategy, circuit breaker behavior, and user-facing impact.
⚠ Too much detail / LLD creep
The HLD includes database table schemas, class diagrams, specific API endpoint definitions, and library version numbers. This makes the document enormous, hard to review, outdated within weeks, and overwhelming for non-implementing stakeholders.
Stop at the component/service boundary. If you're describing methods, tables, or library configs — move it to the LLD or a separate ADR.
⚠ No security architecture
Security is not mentioned or is vaguely deferred: "Authentication will be handled." Where does token validation happen? What is the AuthN/AuthZ model? Are there network boundaries? How is PII encrypted? Security omitted from HLD means it gets bolted on during implementation — the most expensive time to add it.
Dedicate a section to: auth model, network segmentation, data classification, encryption at rest/transit, and secrets management. Get a security review before HLD approval.
⚠ Diagram without labels on arrows
The architecture diagram shows boxes connected by unlabeled arrows. It's impossible to tell whether services communicate synchronously or asynchronously, what protocol is used (HTTP? gRPC? event?), or what data flows in each direction.
Every arrow must have: (1) a protocol label, (2) a direction, (3) optionally the payload type. Use different line styles for sync vs async communications.
⚠ Never updated after initial approval
The HLD is approved, then the system is built — but when decisions change during implementation (a different database, a queue added, a service split), the HLD is never updated. New engineers read the HLD and get a misleading picture of the system. Drift between documentation and reality is a major onboarding and maintenance hazard.
Treat the HLD like code. Changes to architecture require an HLD update PR/review. Add the HLD update as a Definition of Done for architectural changes.
⚠ "Architecture by committee" — no clear ownership
The HLD is written by a committee and no single person is accountable for it. Every section is contradicted by another section. There is no clear decision-maker, so controversial choices (monolith vs microservices, SQL vs NoSQL) are left unresolved with "further discussion needed."
Assign a single DRI (Directly Responsible Individual) — usually the tech lead or architect. Others review and advise; the DRI decides and owns the document.

HLD Review Checklist

Use before submitting for review and before approving

Content completeness

  • Functional requirements are stated clearly
  • Non-functional requirements include specific numbers (latency, throughput, availability)
  • At least one architecture diagram with labeled arrows and protocols
  • Every major component has a description of its responsibility
  • Data flows documented for the 2–3 most critical use cases
  • Data ownership is explicit — each service owns its own data
  • Security architecture addresses auth, encryption, and network boundaries
  • Architectural Decision Records include alternatives and rationale
  • Open issues and assumptions are listed with owners
  • Failure modes documented — what happens when each dependency fails
  • Deployment architecture described — cloud provider, regions, environments
  • Observability approach specified — metrics, logs, traces, alerting
  • Cost estimation or cost considerations noted

Diagram quality

  • All arrows are labeled with protocol (HTTP, gRPC, AMQP, etc.)
  • External systems and third-party integrations are shown
  • Data stores are included and connected to the correct services
  • Sync and async communications use distinct visual styles
  • A legend is provided if the diagram uses more than 4 distinct visual types
  • Diagram is readable without zooming on a standard screen

Review process

  • Reviewed by at least one other senior engineer outside the immediate team
  • Security review completed or explicitly scheduled
  • DevOps / platform team has reviewed deployment and operational concerns
  • Product manager has confirmed the scope aligns with product requirements
  • DBA or data engineer has reviewed data model and storage choices
🔵
Legend: ● Required — must be completed before approval. ● Recommended — strongly advised for production systems. ● Nice to have — valuable when time permits.
🔧

Tools & Resources

Diagramming tools, documentation platforms, and reference frameworks

Diagramming Tools

ToolBest ForNotes
draw.io / diagrams.net Architecture, deployment, flow diagrams Free, works offline, integrates with Confluence/GitHub. Recommended default.
Excalidraw Whiteboard-style quick diagrams, collaborative sketching Hand-drawn aesthetic, excellent for early-stage design sessions. Free, open source.
Lucidchart Polished enterprise diagrams, cross-team collaboration Paid. Best import/export options, strong Jira/Confluence integration.
Mermaid Sequence diagrams, flowcharts, state diagrams as code Text-based — version controllable. Natively supported in GitHub Markdown, Notion, Gitlab.
PlantUML Sequence, component, and deployment diagrams as code More powerful than Mermaid; steeper syntax. Works well in CI pipelines.
C4 Model + Structurizr Layered architectural views (Context → Container → Component) The C4 methodology is the most rigorous framework for HLD diagrams. Structurizr is the official tooling.

Documentation Platforms

PlatformUse
ConfluenceEnterprise wiki, deep Jira integration, template library
NotionFlexible docs, good for smaller teams, supports diagrams via embeds
GitHub / GitLab WikisVersion-controlled, markdown-native, Mermaid built in
BackstageDeveloper portal with TechDocs for versioned architecture docs

Reference Frameworks

C4 Model

Simon Brown's 4-level framework: Context (system in the world), Container (major components), Component (inside a container), Code (class-level). HLD maps to Context + Container levels.

c4model.com

Architecture Decision Records

Michael Nygard's lightweight format for documenting decisions. Each ADR has: Title, Status, Context, Decision, Consequences. Store in /docs/adr/ in the repo.

adr.github.io

AWS Well-Architected Framework

Six pillars: Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, Sustainability. Excellent NFR checklist for cloud-hosted systems.

aws.amazon.com/architecture/well-architected

12-Factor App Methodology

Principles for building maintainable, scalable services: codebase, dependencies, config, backing services, build/release/run, processes, port binding, concurrency, disposability, dev/prod parity, logs, admin processes.

12factor.net

📐
Mermaid quick example — paste this into any Mermaid-compatible editor for an instant sequence diagram:
mermaid
sequenceDiagram actor User participant GW as API Gateway participant Auth as Auth Service participant DB as User DB participant Cache as Redis User->>GW: POST /login {email, password} GW->>Auth: Forward credentials Auth->>Cache: GET session:email Cache-->>Auth: MISS Auth->>DB: SELECT user WHERE email=? DB-->>Auth: {userId, passwordHash} Auth->>Auth: bcrypt.verify(password, hash) Auth->>Cache: SET session:email TTL=3600 Auth-->>GW: {accessToken, refreshToken} GW-->>User: 200 OK {tokens}