GraphQL
Handbook
A practical, decision-first guide to GraphQL — covering what it is, when to adopt it (and when not to), common ARB objections with honest answers, schema design patterns, and production examples.
What Is GraphQL
A query language for your API and a runtime for executing those queries — not a database, not a REST replacement in all cases, not a silver bullet.
GraphQL is a specification (not a framework or library) originally developed by Facebook in 2012 and open-sourced in 2015. It gives clients the power to ask for exactly the data they need — no more, no less — and describes the shape of that data via a strongly-typed schema.
Unlike REST, where the server defines fixed endpoints returning fixed shapes, GraphQL exposes a single endpoint through which clients express their own data requirements declaratively. The server schema acts as a contract between client and server teams.
One HTTP endpoint (typically POST /graphql). The operation type — query, mutation, subscription — is part of the request body, not the URL.
Every field has a type. Types, queries, mutations, and relationships are declared in Schema Definition Language (SDL). The schema is introspectable at runtime.
Clients declare the shape of the response they want. No over-fetching 50 fields when you need 3. No under-fetching that forces multiple round-trips.
GraphQL vs REST vs gRPC — at a glance
| Dimension | GraphQL | REST | gRPC |
|---|---|---|---|
| Transport | HTTP/1.1, HTTP/2, WS | HTTP/1.1, HTTP/2 | HTTP/2 (required) |
| Schema / Contract | Strongly typed SDL | OpenAPI (optional) | Protobuf (required) |
| Over/Under-fetching | Eliminated by design | Common problem | Controlled via proto |
| Versioning | Schema evolution (no v2) | /v1, /v2 or headers | Proto evolution |
| Real-time | Subscriptions (WS/SSE) | Polling or SSE | Streaming native |
| Browser native | Yes (HTTP + JSON) | Yes | Requires gRPC-Web proxy |
| Learning curve | Medium (SDL + concepts) | Low | High (protobuf, streaming) |
| Caching | Complex (client-side) | HTTP cache native | Manual |
| Best fit | Multi-client APIs, BFF | Public APIs, simple CRUD | Internal microservices |
Core Concepts
The five building blocks you need to understand before designing or reviewing a GraphQL API.
Schema Definition Language (SDL)
Queries — asking for data
Resolvers — where data actually comes from
Each field in the schema maps to a resolver function. Resolvers can fetch from databases, microservices, caches, or compute values inline. The schema is the contract; resolvers are the implementation.
When to Use GraphQL
GraphQL earns its complexity overhead in specific scenarios. Use this checklist before proposing adoption.
Web, mobile, TV, partner portals — each consuming a different shape from the same domain data. GraphQL eliminates the BFF proliferation problem: one schema, clients self-select their payload.
Product teams iterating on UI without waiting for backend changes to add/remove fields. Clients evolve queries independently as long as the schema supports the fields.
GraphQL Federation or schema stitching lets you present a unified graph across Order, Customer, Inventory, and Billing services — without a bespoke aggregation microservice per use case.
Social graphs, product catalogs with variants, org hierarchies, knowledge graphs — any domain where relationships matter and REST leads to N+1 round-trips or brittle ?include= parameters.
The SDL is a machine-readable, version-controlled contract. Frontend and backend teams can agree on schema, generate types, and work in parallel before implementation is complete.
When you need live data updates (order tracking, dashboards, notifications) alongside regular data fetching — GraphQL subscriptions unify the protocol rather than adding a separate WebSocket or SSE layer.
Fitness signals — score your project
| Signal | Weight | Interpretation |
|---|---|---|
| 3+ distinct clients consuming the same domain data | STRONG YES | Core value proposition of GraphQL |
| Frontend teams blocked by backend for field additions | STRONG YES | Schema evolution solves this exactly |
| Multiple REST calls chained to build a single view | STRONG YES | GraphQL collapses to a single round-trip |
| Existing microservices team wants to own sub-graphs | YES | Federation pattern applies well |
| Partner / public API with diverse consumers | CONSIDER | Powerful but introspection exposure risk |
| Purely internal service, single consumer | SKIP | gRPC or REST is simpler and faster |
| Simple CRUD with predictable access patterns | SKIP | REST + OpenAPI is the right tool |
When NOT to Use GraphQL
These are the honest contraindications — and what to reach for instead. Being clear here is how you win ARB trust.
A service that does GET /users/:id, POST /users, PUT /users/:id, DELETE /users/:id has no ambiguity. REST is explicit, cacheable, and universally understood. GraphQL adds resolver infrastructure for zero benefit.
Use instead: REST + OpenAPI / Swagger
GraphQL over HTTP adds parsing overhead, query validation, and resolver chains. For internal microservice-to-microservice calls where you control both sides and need sub-millisecond latency, gRPC is 3–5× faster with less overhead.
Use instead: gRPC / Protocol Buffers
GraphQL's single POST endpoint is opaque to HTTP caches (CDNs, proxies, Varnish). Implementing GET-based persisted queries helps, but it's a workaround. REST with proper Cache-Control semantics is far simpler to cache at the edge.
Use instead: REST, or GraphQL with Automatic Persisted Queries (APQ)
GraphQL's value is the contract. If no one owns the schema, if resolvers call each other recursively, if deprecations aren't enforced — you get a worse REST API. The tool requires process maturity, not just technology adoption.
Use instead: REST until the team establishes API governance
GraphQL has no standard for binary/multipart payloads. The community-spec workaround (multipart/form-data with operations JSON) is clunky and not supported by all clients. REST is the right fit for file-centric APIs.
Use instead: REST (presigned URLs + S3 / blob store)
GraphQL's flexibility (clients shaping responses) provides no value when there is exactly one client. You're adding schema infrastructure, resolver overhead, and tooling cost with no payoff. The problem GraphQL solves simply doesn't exist.
Use instead: REST or gRPC depending on sync vs performance needs
ARB Pushback — Objections & Answers
These are the objections most Architecture Review Boards raise about GraphQL proposals. Honest, documented answers — not spin.
This is a real concern, and it's addressable. Without controls, a deeply nested query can trigger recursive resolver chains that exhaust CPU and memory (a "query complexity" attack or "aliasing attack").
Mitigations that must be in place before production:
- Query depth limiting — reject queries beyond N levels deep (e.g., depth 7). Configurable in Apollo Server, Hot Chocolate, Strawberry.
- Query complexity scoring — assign a cost to each field; reject if total exceeds budget.
- Persisted queries / trusted documents — for known clients (web/mobile apps), only allow a pre-registered whitelist of operation hashes. Arbitrary queries are disabled entirely.
- Disable introspection in production — schema exposure aids attackers. Introspection is for developer tooling only.
- Rate limiting per client / IP — same as any API.
With persisted queries, clients cannot send arbitrary operations at all. The "arbitrary query" concern is a non-issue for first-party clients using this pattern.
GraphQL is designed to evolve without versioning. The philosophy is "schema evolution, not version proliferation." New fields are additive and non-breaking. Old fields are @deprecated(reason: "...") and kept alive until usage drops to zero.
Practices that make evolution safe:
- Never remove a field without a deprecation period (monitor usage via field-level tracing).
- Never change a field's type from non-null to nullable or change its semantic meaning.
- Use tooling like GraphQL Inspector in CI to detect breaking changes before merge.
- Federation supports schema registry with compatibility checks (Apollo's Schema Registry / Cosmo).
When a truly breaking change is unavoidable, the recommended pattern is adding a new root field alongside the old one — not a /v2 endpoint — and migrating clients before removing the old field.
Partially true, fully solvable. Standard HTTP GET caching doesn't work for POST /graphql. However:
- Automatic Persisted Queries (APQ) — client sends a hash; server returns the response. On the second hit, the hash is sent via GET, enabling full CDN caching.
- Response caching at the resolver level — Apollo Server, Hot Chocolate, and Strawberry support
@cacheControlhints per field/type, which cache at the application layer. - DataLoader — per-request memoization eliminates redundant DB calls within a single operation (not HTTP caching, but equally important for performance).
If CDN-level caching of individual resource responses is the primary requirement, REST is genuinely simpler. GraphQL caching is application-level, not network-level by default.
Honest answer: there is real overhead. Teams new to GraphQL should expect 4–8 additional weeks of ramp-up for the first production deployment.
Incremental costs vs REST:
- Schema design requires more upfront thought (but pays back with fewer API change requests later).
- DataLoader patterns must be learned and enforced to prevent N+1 in production.
- Observability requires field-level tracing (Apollo Studio, Cosmo, or OpenTelemetry with custom resolvers).
- Security analysis of query complexity and depth must be built into your gateway config.
This cost is justified when GraphQL's strengths (multi-client, federation, schema contracts) apply. If the project doesn't clearly exhibit those needs, the overhead is pure tax.
Use with caution for truly public APIs. GraphQL works well for partner APIs with known consumers, but for an open developer ecosystem, REST has significant advantages:
- REST is universally understood; GraphQL client libraries vary by platform.
- OpenAPI tooling for REST SDKs, documentation, and testing is more mature.
- Introspection must be disabled in prod (complicates external developer experience).
Best of both: expose a REST public API for the ecosystem; use GraphQL internally or for known partners through a dedicated partner portal with controlled tooling.
Schema Design
The schema is the most important artifact in a GraphQL system. Design mistakes here are expensive to undo.
Relay-compatible Connections (pagination)
Use cursor-based pagination via the Relay Connection spec. Offset-based pagination breaks when records are inserted/deleted mid-query. The pattern is widely supported by GraphQL clients and federation routers.
Mutation payload pattern
Mutations should return rich payload types — not just the affected entity. This allows returning errors inline (without abusing HTTP status codes), metadata, and related objects the client needs to update its local state.
Schema design rules
| Rule | Reason |
|---|---|
Never expose database IDs directly — use opaque ID scalars | Prevents clients from guessing/iterating IDs; allows backend migration |
| Separate input types from output types | Output types may have computed fields and resolvers not valid as inputs |
Name mutations with verb + noun: createOrder, cancelOrder | Clarity over REST-style resource thinking |
Non-null (!) everything that will never be null in the domain | Clients get compile-time guarantees; avoids defensive null checks everywhere |
Add @deprecated(reason) before removing any field — never remove immediately | Prevents client breakage; reason tells clients what to migrate to |
Avoid generic types like JSON scalar or Map<String, Any> | Loses type safety; use typed union/interface instead |
| Model domain concepts, not database tables | GraphQL is a product API, not a DB reflection; align to business language |
Queries & Mutations
Client operation patterns and resolver implementation best practices.
Fragments — reusable field sets
DataLoader — eliminating N+1
Subscriptions
Real-time event streams — when to use them and operational implications.
- Order status tracking (customer-facing)
- Live dashboards and analytics counters
- Collaborative editing presence indicators
- Notification feeds and alert streams
- IoT sensor data aggregation
- Polling-based reports (use REST + polling)
- Single-event webhooks (use REST webhooks)
- High-frequency binary streams (use WebRTC or raw WS)
- When WebSocket infra isn't production-ready
- Serverless-only deployments (WS requires persistent connections)
Example: Backend-for-Frontend (BFF) Pattern
The most common GraphQL adoption pattern in enterprise — a GraphQL layer in front of existing REST or gRPC services, purpose-built for your product surfaces.
Rather than forcing each client (web, iOS, Android) to orchestrate calls to Order Service, Customer Service, and Inventory Service independently, a GraphQL BFF acts as an orchestration layer. Clients make one request; the BFF fans out to upstream services and assembles the response.
Example: GraphQL Federation
Federation allows multiple teams to own sub-graphs that compose into a unified supergraph — without a central BFF team becoming a bottleneck.
Each service publishes its own partial schema. A router (Apollo Router, Cosmo, or WunderGraph) composes them into a unified schema at query-planning time. Teams work independently; the contract is the federation spec.
Security
GraphQL-specific attack surfaces and mandatory mitigations for production.
| Threat | Mitigation | Library |
|---|---|---|
| Query depth attack — deeply nested query exhausts resolvers | Enforce max depth (e.g., 10 levels) | graphql-depth-limit, built-in to Hot Chocolate |
| Query complexity attack — expensive field combinations | Assign costs per field; reject over budget | graphql-cost-analysis, Apollo cost directives |
| Introspection disclosure — schema exposed to attackers | Disable introspection in production; enable only in dev/staging | Apollo Server: introspection: false |
| Unbounded results — query returns millions of rows | Enforce max first/limit arguments at schema level |
Custom validation rule or schema directive |
| Alias flooding — duplicate fields with different aliases | Count aliased fields in complexity scoring | Custom validation or complexity library |
| Authorization bypass — field accessed without permission | Field-level auth with @auth directive or middleware |
GraphQL Shield, Hot Chocolate's @authorize |
| Arbitrary operations (partner/public) | Persisted queries — only pre-approved operation hashes allowed | Apollo APQ, Relay persisted queries |
Performance
Every resolver that loads a related entity must go through DataLoader. Without it, loading 100 orders and their customers issues 101 database queries. With DataLoader, it's 2. Non-negotiable in production.
Use Apollo Studio, Cosmo, or OpenTelemetry spans to see resolver execution time per field. Slow resolvers are immediately visible. Optimize before — not after — load testing.
Use @cacheControl directives to annotate fields with cache TTLs. The server can return a Cache-Control header representing the minimum TTL across all resolved fields in the response.
Send a hash instead of the full query string. Reduces request payload size (important on mobile). Enables GET-based requests which CDNs can cache. Security and performance benefit in one.
Use @defer on non-critical fields to return the primary payload immediately and stream deferred fields as they resolve. Reduces perceived latency for complex pages.
The router generates a query plan across subgraphs. Fetch subgraph data in parallel where possible. Avoid deeply nested cross-subgraph entity references that force sequential fetches.
Tooling
| Category | Tool | Notes |
|---|---|---|
| Server (.NET) | Hot Chocolate / Strawberry Shake | First-class .NET GraphQL server. Annotation-based + SDL-first. Federation v2 support. |
| Server (Node) | Apollo Server, Yoga | Apollo for enterprise features + Studio. Yoga for lightweight/edge deployments. |
| Client (React) | Apollo Client, urql, Relay | Apollo for full ecosystem. urql for lightweight. Relay for Relay-spec pagination + Facebook scale. |
| Client (.NET) | Strawberry Shake | Generated typed client from schema. Integrates with Hot Chocolate ecosystem. |
| Code generation | GraphQL Code Generator | Generates TypeScript types, React hooks, and resolvers from schema + operations. Run in CI. |
| Schema management | GraphQL Inspector | Breaking change detection in CI. Diff schemas. Validate coverage. |
| IDE tooling | GraphiQL, Apollo Sandbox | In-browser query explorers. Apollo Sandbox works without local server. |
| Federation router | Apollo Router, Cosmo | Apollo Router (Rust): mature, open core. Cosmo: fully open-source alternative with managed option. |
| Schema registry | Apollo GraphOS, Cosmo Platform | Schema versioning, breaking change gating, subgraph composition validation. |
| Observability | Apollo Studio, OpenTelemetry | Field-level usage metrics, error rates, resolver latency heatmaps. |
| Testing | jest + graphql-tag, Testcontainers | Unit-test resolvers in isolation. Integration-test against a real schema + DB. |
Reference Links
Quick decision matrix
| Scenario | Recommendation | Rationale |
|---|---|---|
| Multiple clients, different payload needs | GraphQL | Core value prop — clients shape their own queries |
| Aggregating 3+ microservices for a product surface | GraphQL Federation | Avoids bespoke BFF per surface; teams own sub-graphs |
| Simple internal CRUD microservice | REST or gRPC | Zero over/under-fetching problem; added complexity not justified |
| Public API with unknown consumers | REST preferred | REST has wider tooling, simpler caching, lower learning curve |
| Real-time + query on the same domain | GraphQL + Subscriptions | Unified protocol; no separate WS service needed |
| Internal service, team owns both sides | gRPC | Strictly typed contract, 3–5× faster, no browser concern |
| High CDN cache dependency | REST or APQ | HTTP GET caching native to REST; APQ is a workaround |
| File uploads are a primary use case | REST + presigned URLs | GraphQL has no standard multipart support |