High-Level Design
Handbook
A comprehensive reference for engineers and architects — covering what HLD is, what it must contain, how to create one, common pitfalls, and real worked examples with diagrams.
What is High-Level Design?
High-Level Design (HLD) is a macro-level architectural blueprint of a software system. It defines what major components exist, how they interact, and why those design choices were made — without diving into implementation specifics like class hierarchies or database schemas.
An HLD answers the question: "If this system were a city, what are the districts, highways, and utilities — and how do they connect?" It is the first design artifact produced after requirements are gathered, and it is the primary communication tool between architects, engineers, product managers, and stakeholders.
System-wide view. Focuses on major components, services, databases, and external integrations — not individual classes or functions.
Technical leads, architects, senior engineers, product managers, and DevOps. Must be readable by both technical and semi-technical stakeholders.
Created after requirements are defined and before Low-Level Design (LLD) or implementation begins. Feeds into sprint planning and effort estimation.
Core goals of an HLD
Creates shared understanding of system structure across all teams. Reduces ambiguity, prevents "we assumed you'd handle that" conversations.
Documents architectural decisions and their rationale (the "why"). Crucial for onboarding future engineers and for audit/compliance purposes.
Forces early thinking about scalability, single points of failure, security boundaries, and integration complexity — when changes are cheap.
Provides the structure for breaking work into epics and stories. Teams cannot accurately estimate without understanding the system shape.
HLD vs Low-Level Design
| Dimension | High-Level Design (HLD) | Low-Level Design (LLD) |
|---|---|---|
| Granularity | Services, modules, databases, external APIs | Classes, methods, algorithms, DB schemas |
| Who reads it | Architects, TLs, PMs, DevOps, all engineers | Implementing engineers on the specific module |
| Key questions | "What are the building blocks?" "How do they talk?" | "How is this class structured?" "What is the algorithm?" |
| Diagrams | Architecture diagrams, data-flow diagrams, deployment diagrams | Class diagrams, sequence diagrams, ER diagrams |
| Technology mentions | Technology choices with rationale (e.g., Kafka, PostgreSQL) | Specific library versions, exact table/column names |
| Produced by | System architect / tech lead | Senior / lead engineers per module |
| Created when | Before LLD; right after requirements finalised | After HLD is approved; before implementation |
| Typical length | 5–20 pages with diagrams | Varies; module-specific, often 10–50 pages |
When to Write an HLD
Not every feature needs an HLD — but skipping one when you should write it is a common cause of expensive mid-project rework. Use this threshold guide:
- Building a new service or application from scratch
- Introducing a new major subsystem (e.g., a notification pipeline, payment gateway)
- Adding a new external integration or third-party dependency
- Changing a core architectural layer (e.g., migrating to event-driven)
- The change involves 3+ engineers or 2+ teams
- The effort is estimated at more than 2 weeks
- Regulatory / compliance review is required
- A bug fix or small feature within a single service
- A UI-only change with no backend implications
- Minor configuration changes (feature flags, threshold adjustments)
- A single-engineer spike or proof of concept
- Work within a well-understood, well-documented bounded context
HLD in the SDLC
What to Include in an HLD
1–2 paragraphs. Why does this system exist? What problem does it solve? Who are the key stakeholders? What is out of scope?
Functional requirements (what the system does) and non-functional requirements — NFRs — (performance, availability, security, scalability targets). These drive every architectural decision.
The core visual. Shows all major components, services, databases, queues, and their connections. Must include external systems and clients. Label every arrow with the protocol.
One brief section per major component. What is its single responsibility? What does it own? What does it not own? What are its interfaces?
How data moves through the system for the 2–3 most important use cases. Makes implicit sequencing explicit and reveals integration points early.
Which databases, caches, or object stores are used and why. Data models at the entity level (not schema level). Ownership — which service owns which data.
Key APIs between services — method names, payloads at a high level. External integrations and third-party dependencies. Communication patterns: REST, gRPC, event-driven, etc.
Scalability strategy (horizontal vs vertical), caching layers, CDN usage, rate limiting, failover, circuit breakers, observability (metrics, logging, tracing).
Authentication/authorization strategy (OAuth2, JWT, RBAC), network boundaries, secrets management, data encryption at rest and in transit, and known threat vectors.
How the system is deployed — cloud provider, regions, containerisation, orchestration. CI/CD approach at a high level. IaC strategy.
A table or list of key decisions made, alternatives considered, and the rationale for the chosen approach. This is the most valuable long-term artefact.
Unresolved questions, known risks, assumptions made, and dependencies on external teams or decisions. Keeps stakeholders honest about unknowns.
Document Template
- Document title: [System/Feature Name] — High-Level Design
- Authors, reviewers, approvers
- Version, status (Draft / In Review / Approved), date
- Related documents (PRD, RFC, JIRA epic link)
- Functional: List the top 5–10 capabilities the system must have.
- Non-Functional: Availability (e.g., 99.9% SLA), latency budgets (p99 < 200ms), throughput (10k RPS peak), storage volumes, data retention, compliance requirements.
- Constraints: Existing tech stack, budget limits, timeline, team size.
Diagram Types in HLD
| Diagram Type | Purpose | When to Include | Tools |
|---|---|---|---|
Architecture DiagramC4 Context / Container |
Shows major components, services, and their connections. The single most important diagram. | Always — every HLD must have one | draw.io, Lucidchart, Excalidraw, C4 model |
| Data Flow Diagram (DFD) | Shows how data moves through the system — inputs, processes, storage, outputs | Any system with complex data transformations or pipelines | draw.io, Miro, Lucidchart |
| Sequence Diagram | Shows interactions over time between components for a specific use case | Complex multi-service flows — auth, checkout, notifications | PlantUML, Mermaid, Sequence Diagram.org |
| Deployment Diagram | Shows physical/cloud infrastructure — VPCs, regions, pods, load balancers | Cloud or distributed deployments | draw.io, AWS Architecture Diagrams |
| Entity-Relationship (high level) | Shows major data entities and their relationships — not table schemas | Data-heavy systems, multi-service data ownership questions | draw.io, dbdiagram.io (conceptual level) |
| State Diagram | Shows lifecycle states of a key domain object (e.g., an Order) | When a domain object has complex state transitions | PlantUML, draw.io, Mermaid |
Step-by-Step: How to Create an HLD
-
1Understand and document requirementsBefore drawing anything, extract both functional and non-functional requirements. Interview stakeholders, review the PRD, and identify the three most critical user journeys. Write down explicit capacity targets: peak RPS, storage volumes, latency SLAs, availability targets (e.g., 99.9% = 8.7 hours downtime/year). NFRs drive architecture more than functional requirements do.
-
2Identify and bound major componentsList every major capability the system needs (auth, product catalog, orders, payments, notifications, etc.). Group related capabilities into bounded contexts or services. Apply the Single Responsibility Principle: each component should have one clear job. Name things by what they do, not how they do it.
-
3Choose the right architectural patternMatch the pattern to the scale and team. A startup with 5 engineers doesn't need microservices. Options include: Monolith (fastest to build), Modular Monolith (structured, but single deploy), Microservices (independent scaling, team autonomy), Event-Driven (async decoupling), CQRS (read/write separation). Document the pattern chosen and the alternatives rejected.
-
4Draw the architecture diagram firstStart with a whiteboard or Excalidraw. Put clients at the top, databases at the bottom, services in the middle. Draw arrows representing data flow and label each with the protocol. Use the C4 model: start at Context level (system + external users), then drill into Container level (services, databases, queues). Get this diagram reviewed before writing any prose.
-
5Document data flows for critical pathsPick the 2–3 most important use cases (usually: user registration, the primary write path, and the primary read path). Trace data from entry point through every component to final storage/response. Use sequence diagrams or numbered flow diagrams. These reveal hidden complexity and missing components early.
-
6Define data ownership and storage choicesFor each service, define what data it owns and what database type is appropriate. SQL for relational/transactional data, NoSQL for document/flexible-schema data, cache for ephemeral/hot data, object storage for blobs. Never share a database between services — use APIs or events instead. Document replication, backup, and recovery strategy.
-
7Address non-functional requirements explicitlyFor each NFR, describe the mechanism: How will you achieve 99.9% uptime? (Multi-region, health checks, auto-restart.) How will you handle 10x traffic spike? (Horizontal scaling, queue-based load leveling.) How will you detect failures? (Distributed tracing, alerting on SLOs.) NFRs without mechanisms are just wishful thinking.
-
8Document architectural decisions (ADRs)For every major decision (database choice, communication protocol, caching strategy, auth mechanism), create an ADR entry: what was decided, what alternatives were considered, and why this choice was made. ADRs are the most valuable long-term output of the HLD process because they prevent the same debates recurring six months later.
-
9Conduct a structured reviewShare the draft with at least: one other senior engineer, a security reviewer, a representative from operations/DevOps, and a product manager. Run a dedicated 60-minute review session. Specifically stress-test failure scenarios: "What happens if Service X is down?" "What happens if the database has a 5-second spike?" Capture all feedback and resolve before marking Approved.
-
10Version-control and maintain the documentStore the HLD in a version-controlled wiki (Confluence, Notion, GitHub). Add a clear version history table. Update the document when significant architectural changes are made. Consider the HLD "stale" if it hasn't been reviewed in 6 months and the system is actively evolving.
Diagram Examples
Pattern 1 — 3-Tier Web Architecture
Pattern 2 — Microservices with Event Bus
Pattern 3 — Data Flow Diagram (User Login)
Full Example: E-Commerce Platform HLD
1. Executive Summary
Build a scalable e-commerce platform supporting product browsing, cart management, checkout, and order tracking. Target audience: 500k registered users, 50k daily active users. The system must support Black Friday traffic peaks of 5,000 RPS. Out of scope: warehouse management, physical shipping logistics, returns processing.
2. Non-Functional Requirements
| Requirement | Target | Notes |
|---|---|---|
| Availability | 99.9% (max 8.7h downtime/year) | Exclude planned maintenance windows |
| Read latency (catalog) | p99 < 100ms | Served via CDN + cache |
| Write latency (checkout) | p99 < 500ms | End-to-end including payment auth |
| Peak throughput | 5,000 RPS | Checkout: 500 RPS, Browse: 4,500 RPS |
| Data durability | 99.999999% | Order/payment data — multi-AZ DB |
| RPO / RTO | RPO: 5min / RTO: 30min | For primary DB failure |
3. Architecture Diagram
11. Architectural Decision Records
| Decision | Alternatives Considered | Rationale |
|---|---|---|
| Microservices architecture | Monolith, Modular Monolith | Team independence, independent scaling of hot services (Product, Order). Accepted: higher operational complexity. |
| PostgreSQL for transactional data | MySQL, DynamoDB, MongoDB | ACID guarantees essential for orders/payments. Team has strong Postgres expertise. RDS Multi-AZ for HA. |
| Kafka as event bus | RabbitMQ, AWS SQS/SNS, Redis Streams | Replay capability, guaranteed ordering, high throughput required. Needed for audit trail of order events. |
| Redis for caching | Memcached, DynamoDB DAX | Rich data structures, TTL support, cluster mode for horizontal scaling. Session storage + product cache. |
| Elasticsearch for product search | PostgreSQL full-text, Algolia, Typesense | Fuzzy search, faceted filtering, and relevance scoring at scale. Algolia ruled out due to cost at our volume. |
Common Pitfalls
HLD Review Checklist
Content completeness
- Functional requirements are stated clearly
- Non-functional requirements include specific numbers (latency, throughput, availability)
- At least one architecture diagram with labeled arrows and protocols
- Every major component has a description of its responsibility
- Data flows documented for the 2–3 most critical use cases
- Data ownership is explicit — each service owns its own data
- Security architecture addresses auth, encryption, and network boundaries
- Architectural Decision Records include alternatives and rationale
- Open issues and assumptions are listed with owners
- Failure modes documented — what happens when each dependency fails
- Deployment architecture described — cloud provider, regions, environments
- Observability approach specified — metrics, logs, traces, alerting
- Cost estimation or cost considerations noted
Diagram quality
- All arrows are labeled with protocol (HTTP, gRPC, AMQP, etc.)
- External systems and third-party integrations are shown
- Data stores are included and connected to the correct services
- Sync and async communications use distinct visual styles
- A legend is provided if the diagram uses more than 4 distinct visual types
- Diagram is readable without zooming on a standard screen
Review process
- Reviewed by at least one other senior engineer outside the immediate team
- Security review completed or explicitly scheduled
- DevOps / platform team has reviewed deployment and operational concerns
- Product manager has confirmed the scope aligns with product requirements
- DBA or data engineer has reviewed data model and storage choices
Tools & Resources
Diagramming Tools
| Tool | Best For | Notes |
|---|---|---|
| draw.io / diagrams.net | Architecture, deployment, flow diagrams | Free, works offline, integrates with Confluence/GitHub. Recommended default. |
| Excalidraw | Whiteboard-style quick diagrams, collaborative sketching | Hand-drawn aesthetic, excellent for early-stage design sessions. Free, open source. |
| Lucidchart | Polished enterprise diagrams, cross-team collaboration | Paid. Best import/export options, strong Jira/Confluence integration. |
| Mermaid | Sequence diagrams, flowcharts, state diagrams as code | Text-based — version controllable. Natively supported in GitHub Markdown, Notion, Gitlab. |
| PlantUML | Sequence, component, and deployment diagrams as code | More powerful than Mermaid; steeper syntax. Works well in CI pipelines. |
| C4 Model + Structurizr | Layered architectural views (Context → Container → Component) | The C4 methodology is the most rigorous framework for HLD diagrams. Structurizr is the official tooling. |
Documentation Platforms
| Platform | Use |
|---|---|
| Confluence | Enterprise wiki, deep Jira integration, template library |
| Notion | Flexible docs, good for smaller teams, supports diagrams via embeds |
| GitHub / GitLab Wikis | Version-controlled, markdown-native, Mermaid built in |
| Backstage | Developer portal with TechDocs for versioned architecture docs |
Reference Frameworks
Simon Brown's 4-level framework: Context (system in the world), Container (major components), Component (inside a container), Code (class-level). HLD maps to Context + Container levels.
c4model.com
Michael Nygard's lightweight format for documenting decisions. Each ADR has: Title, Status, Context, Decision, Consequences. Store in /docs/adr/ in the repo.
adr.github.io
Six pillars: Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, Sustainability. Excellent NFR checklist for cloud-hosted systems.
aws.amazon.com/architecture/well-architected
Principles for building maintainable, scalable services: codebase, dependencies, config, backing services, build/release/run, processes, port binding, concurrency, disposability, dev/prod parity, logs, admin processes.
12factor.net