??????? GraphQL Enterprise Handbook
gql
GraphQL
Enterprise Handbook
GraphQL 2021 Apollo / Hotchocolate ARB Ready
Enterprise Architecture Reference

GraphQL
Handbook

A practical, decision-first guide to GraphQL — covering what it is, when to adopt it (and when not to), common ARB objections with honest answers, schema design patterns, and production examples.

Query Language Type System Federation ARB Guide Subscriptions

🔷 What Is GraphQL

A query language for your API and a runtime for executing those queries — not a database, not a REST replacement in all cases, not a silver bullet.

GraphQL is a specification (not a framework or library) originally developed by Facebook in 2012 and open-sourced in 2015. It gives clients the power to ask for exactly the data they need — no more, no less — and describes the shape of that data via a strongly-typed schema.

Unlike REST, where the server defines fixed endpoints returning fixed shapes, GraphQL exposes a single endpoint through which clients express their own data requirements declaratively. The server schema acts as a contract between client and server teams.

Single Endpoint

One HTTP endpoint (typically POST /graphql). The operation type — query, mutation, subscription — is part of the request body, not the URL.

Typed Schema

Every field has a type. Types, queries, mutations, and relationships are declared in Schema Definition Language (SDL). The schema is introspectable at runtime.

Client-Driven

Clients declare the shape of the response they want. No over-fetching 50 fields when you need 3. No under-fetching that forces multiple round-trips.

GraphQL vs REST vs gRPC — at a glance

Dimension GraphQL REST gRPC
Transport HTTP/1.1, HTTP/2, WS HTTP/1.1, HTTP/2 HTTP/2 (required)
Schema / Contract Strongly typed SDL OpenAPI (optional) Protobuf (required)
Over/Under-fetching Eliminated by design Common problem Controlled via proto
Versioning Schema evolution (no v2) /v1, /v2 or headers Proto evolution
Real-time Subscriptions (WS/SSE) Polling or SSE Streaming native
Browser native Yes (HTTP + JSON) Yes Requires gRPC-Web proxy
Learning curve Medium (SDL + concepts) Low High (protobuf, streaming)
Caching Complex (client-side) HTTP cache native Manual
Best fit Multi-client APIs, BFF Public APIs, simple CRUD Internal microservices

🧱 Core Concepts

The five building blocks you need to understand before designing or reviewing a GraphQL API.

Schema Definition Language (SDL)

GraphQL SDL schema.graphql
# Scalar types: String, Int, Float, Boolean, ID (built-in) # Custom scalars for domain types scalar DateTime scalar Money # Object type — a node in your graph type Order { id: ID! # ! = non-null placedAt: DateTime! status: OrderStatus! total: Money! customer: Customer! # relationship — resolved separately lineItems: [LineItem!]! # non-null list of non-null items } # Enum enum OrderStatus { PENDING PROCESSING SHIPPED DELIVERED CANCELLED } # Input type — used for mutation arguments (never reuse output types) input PlaceOrderInput { customerId: ID! items: [OrderItemInput!]! couponCode: String # nullable = optional } # Root types type Query { order(id: ID!): Order orders(filter: OrderFilter, page: PageInput): OrderConnection! me: Customer } type Mutation { placeOrder(input: PlaceOrderInput!): PlaceOrderPayload! cancelOrder(id: ID!): CancelOrderPayload! } type Subscription { orderStatusChanged(orderId: ID!): OrderStatusEvent! }

Queries — asking for data

✓ Client query — asks exactly what it needs
# Mobile app — minimal payload query GetOrderSummary($id: ID!) { order(id: $id) { id status total placedAt } } # Dashboard — richer shape, same query query GetOrderDetail($id: ID!) { order(id: $id) { id status total customer { name email } lineItems { sku quantity unitPrice } } }
✕ REST equivalent — over-fetches for mobile
# REST: GET /orders/:id # Returns everything regardless of client { "id": "ord_123", "status": "SHIPPED", "total": 149.99, "customer": { "id": "...", "name": "...", "email": "...", "address": { /* ...8 fields */ }, "preferences": { /* ...12 fields */ } }, "lineItems": [ /* ...full objects */ ], "auditLog": [ /* mobile doesn't need this */ ] }

Resolvers — where data actually comes from

Each field in the schema maps to a resolver function. Resolvers can fetch from databases, microservices, caches, or compute values inline. The schema is the contract; resolvers are the implementation.

TypeScript — Apollo Server resolver
const resolvers = { Query: { // Root resolver — fetches a single order order: async (_parent, { id }, { dataSources }) => { return dataSources.orderAPI.getById(id); }, }, Order: { // Field resolver — lazy-loads customer only when requested customer: async (order, _args, { dataSources }) => { return dataSources.customerAPI.getById(order.customerId); }, }, Mutation: { placeOrder: async (_parent, { input }, { dataSources, user }) => { // Resolvers are the right place for auth checks & validation if (!user) throw new AuthenticationError('Login required'); const order = await dataSources.orderAPI.place(input, user.id); return { order, success: true }; }, }, };
⚠️
N+1 problem: Naive resolvers trigger one DB query per item in a list. A query returning 100 orders and loading each customer fires 101 queries. Use DataLoader (batching + per-request memoization) to collapse these into a single query. This is the single most important performance concept in GraphQL.

When to Use GraphQL

GraphQL earns its complexity overhead in specific scenarios. Use this checklist before proposing adoption.

Multiple clients with different needs

Web, mobile, TV, partner portals — each consuming a different shape from the same domain data. GraphQL eliminates the BFF proliferation problem: one schema, clients self-select their payload.

Rapid frontend iteration

Product teams iterating on UI without waiting for backend changes to add/remove fields. Clients evolve queries independently as long as the schema supports the fields.

Aggregating multiple services

GraphQL Federation or schema stitching lets you present a unified graph across Order, Customer, Inventory, and Billing services — without a bespoke aggregation microservice per use case.

Deeply nested or graph-like data

Social graphs, product catalogs with variants, org hierarchies, knowledge graphs — any domain where relationships matter and REST leads to N+1 round-trips or brittle ?include= parameters.

Teams wanting schema-first contracts

The SDL is a machine-readable, version-controlled contract. Frontend and backend teams can agree on schema, generate types, and work in parallel before implementation is complete.

Real-time requirements alongside queries

When you need live data updates (order tracking, dashboards, notifications) alongside regular data fetching — GraphQL subscriptions unify the protocol rather than adding a separate WebSocket or SSE layer.

Fitness signals — score your project

SignalWeightInterpretation
3+ distinct clients consuming the same domain dataSTRONG YESCore value proposition of GraphQL
Frontend teams blocked by backend for field additionsSTRONG YESSchema evolution solves this exactly
Multiple REST calls chained to build a single viewSTRONG YESGraphQL collapses to a single round-trip
Existing microservices team wants to own sub-graphsYESFederation pattern applies well
Partner / public API with diverse consumersCONSIDERPowerful but introspection exposure risk
Purely internal service, single consumerSKIPgRPC or REST is simpler and faster
Simple CRUD with predictable access patternsSKIPREST + OpenAPI is the right tool

🚫 When NOT to Use GraphQL

These are the honest contraindications — and what to reach for instead. Being clear here is how you win ARB trust.

Simple CRUD services

A service that does GET /users/:id, POST /users, PUT /users/:id, DELETE /users/:id has no ambiguity. REST is explicit, cacheable, and universally understood. GraphQL adds resolver infrastructure for zero benefit.

Use instead: REST + OpenAPI / Swagger

High-throughput, low-latency internal services

GraphQL over HTTP adds parsing overhead, query validation, and resolver chains. For internal microservice-to-microservice calls where you control both sides and need sub-millisecond latency, gRPC is 3–5× faster with less overhead.

Use instead: gRPC / Protocol Buffers

Public APIs requiring HTTP caching

GraphQL's single POST endpoint is opaque to HTTP caches (CDNs, proxies, Varnish). Implementing GET-based persisted queries helps, but it's a workaround. REST with proper Cache-Control semantics is far simpler to cache at the edge.

Use instead: REST, or GraphQL with Automatic Persisted Queries (APQ)

Teams without schema discipline

GraphQL's value is the contract. If no one owns the schema, if resolvers call each other recursively, if deprecations aren't enforced — you get a worse REST API. The tool requires process maturity, not just technology adoption.

Use instead: REST until the team establishes API governance

File upload heavy workflows

GraphQL has no standard for binary/multipart payloads. The community-spec workaround (multipart/form-data with operations JSON) is clunky and not supported by all clients. REST is the right fit for file-centric APIs.

Use instead: REST (presigned URLs + S3 / blob store)

Single-consumer APIs

GraphQL's flexibility (clients shaping responses) provides no value when there is exactly one client. You're adding schema infrastructure, resolver overhead, and tooling cost with no payoff. The problem GraphQL solves simply doesn't exist.

Use instead: REST or gRPC depending on sync vs performance needs

The "we might need it someday" trap. GraphQL is often proposed preemptively for flexibility. Flexibility has a cost — schema ownership, DataLoader wiring, federation topology, security analysis of arbitrary queries. Start with REST; migrate to GraphQL when the multi-client problem is actually present, not anticipated.

🏛️ ARB Pushback — Objections & Answers

These are the objections most Architecture Review Boards raise about GraphQL proposals. Honest, documented answers — not spin.

ARB #1
"GraphQL allows clients to run arbitrary queries — this is a security risk we can't accept."

This is a real concern, and it's addressable. Without controls, a deeply nested query can trigger recursive resolver chains that exhaust CPU and memory (a "query complexity" attack or "aliasing attack").

Mitigations that must be in place before production:

  • Query depth limiting — reject queries beyond N levels deep (e.g., depth 7). Configurable in Apollo Server, Hot Chocolate, Strawberry.
  • Query complexity scoring — assign a cost to each field; reject if total exceeds budget.
  • Persisted queries / trusted documents — for known clients (web/mobile apps), only allow a pre-registered whitelist of operation hashes. Arbitrary queries are disabled entirely.
  • Disable introspection in production — schema exposure aids attackers. Introspection is for developer tooling only.
  • Rate limiting per client / IP — same as any API.

With persisted queries, clients cannot send arbitrary operations at all. The "arbitrary query" concern is a non-issue for first-party clients using this pattern.

ARB #2
"How do we version this API? What happens when the schema changes?"

GraphQL is designed to evolve without versioning. The philosophy is "schema evolution, not version proliferation." New fields are additive and non-breaking. Old fields are @deprecated(reason: "...") and kept alive until usage drops to zero.

Practices that make evolution safe:

  • Never remove a field without a deprecation period (monitor usage via field-level tracing).
  • Never change a field's type from non-null to nullable or change its semantic meaning.
  • Use tooling like GraphQL Inspector in CI to detect breaking changes before merge.
  • Federation supports schema registry with compatibility checks (Apollo's Schema Registry / Cosmo).

When a truly breaking change is unavoidable, the recommended pattern is adding a new root field alongside the old one — not a /v2 endpoint — and migrating clients before removing the old field.

ARB #3
"We can't cache GraphQL responses — this will kill our CDN strategy."

Partially true, fully solvable. Standard HTTP GET caching doesn't work for POST /graphql. However:

  • Automatic Persisted Queries (APQ) — client sends a hash; server returns the response. On the second hit, the hash is sent via GET, enabling full CDN caching.
  • Response caching at the resolver level — Apollo Server, Hot Chocolate, and Strawberry support @cacheControl hints per field/type, which cache at the application layer.
  • DataLoader — per-request memoization eliminates redundant DB calls within a single operation (not HTTP caching, but equally important for performance).

If CDN-level caching of individual resource responses is the primary requirement, REST is genuinely simpler. GraphQL caching is application-level, not network-level by default.

ARB #4
"What's the operational overhead? Our team knows REST; this adds unknown cost."

Honest answer: there is real overhead. Teams new to GraphQL should expect 4–8 additional weeks of ramp-up for the first production deployment.

Incremental costs vs REST:

  • Schema design requires more upfront thought (but pays back with fewer API change requests later).
  • DataLoader patterns must be learned and enforced to prevent N+1 in production.
  • Observability requires field-level tracing (Apollo Studio, Cosmo, or OpenTelemetry with custom resolvers).
  • Security analysis of query complexity and depth must be built into your gateway config.

This cost is justified when GraphQL's strengths (multi-client, federation, schema contracts) apply. If the project doesn't clearly exhibit those needs, the overhead is pure tax.

ARB #5
"Our API must be public-facing and consumed by third parties. Is GraphQL appropriate?"

Use with caution for truly public APIs. GraphQL works well for partner APIs with known consumers, but for an open developer ecosystem, REST has significant advantages:

  • REST is universally understood; GraphQL client libraries vary by platform.
  • OpenAPI tooling for REST SDKs, documentation, and testing is more mature.
  • Introspection must be disabled in prod (complicates external developer experience).

Best of both: expose a REST public API for the ecosystem; use GraphQL internally or for known partners through a dedicated partner portal with controlled tooling.

📐 Schema Design

The schema is the most important artifact in a GraphQL system. Design mistakes here are expensive to undo.

Relay-compatible Connections (pagination)

Use cursor-based pagination via the Relay Connection spec. Offset-based pagination breaks when records are inserted/deleted mid-query. The pattern is widely supported by GraphQL clients and federation routers.

GraphQL SDL — Connections
# Standard connection pattern — don't invent your own pagination type OrderConnection { edges: [OrderEdge!]! pageInfo: PageInfo! totalCount: Int! } type OrderEdge { cursor: String! node: Order! } type PageInfo { hasNextPage: Boolean! hasPreviousPage: Boolean! startCursor: String endCursor: String } type Query { # first/after for forward, last/before for backward pagination orders( first: Int after: String last: Int before: String filter: OrderFilter ): OrderConnection! }

Mutation payload pattern

Mutations should return rich payload types — not just the affected entity. This allows returning errors inline (without abusing HTTP status codes), metadata, and related objects the client needs to update its local state.

GraphQL SDL — Mutation Payloads
# Return a payload type, not the raw entity type PlaceOrderPayload { order: Order # null if failed errors: [UserError!]! # business validation errors success: Boolean! } type UserError { field: String # which field caused the error message: String! code: ErrorCode! } enum ErrorCode { INVALID_INPUT INSUFFICIENT_STOCK PAYMENT_DECLINED UNAUTHORIZED } # Client can now handle errors inline without try/catch on HTTP codes: # mutation { placeOrder(...) { success errors { field message } order { id } } }

Schema design rules

RuleReason
Never expose database IDs directly — use opaque ID scalarsPrevents clients from guessing/iterating IDs; allows backend migration
Separate input types from output typesOutput types may have computed fields and resolvers not valid as inputs
Name mutations with verb + noun: createOrder, cancelOrderClarity over REST-style resource thinking
Non-null (!) everything that will never be null in the domainClients get compile-time guarantees; avoids defensive null checks everywhere
Add @deprecated(reason) before removing any field — never remove immediatelyPrevents client breakage; reason tells clients what to migrate to
Avoid generic types like JSON scalar or Map<String, Any>Loses type safety; use typed union/interface instead
Model domain concepts, not database tablesGraphQL is a product API, not a DB reflection; align to business language

Queries & Mutations

Client operation patterns and resolver implementation best practices.

Fragments — reusable field sets

GraphQL — Client Operations
# Define reusable field sets with fragments fragment OrderCard on Order { id status total placedAt } fragment OrderDetail on Order { ...OrderCard customer { name email } lineItems { sku quantity unitPrice } } # Use fragments in multiple operations — keeps queries DRY query ListOrders($after: String) { orders(first: 20, after: $after) { edges { node { ...OrderCard } } pageInfo { hasNextPage endCursor } } } query GetOrder($id: ID!) { order(id: $id) { ...OrderDetail } } # Mutation with inline fragment on payload mutation PlaceOrder($input: PlaceOrderInput!) { placeOrder(input: $input) { success order { ...OrderCard } errors { field message code } } }

DataLoader — eliminating N+1

TypeScript — DataLoader pattern
import DataLoader from 'dataloader'; // Batch function: receives array of IDs, returns array of results in same order const customerLoader = new DataLoader<string, Customer>( async (customerIds) => { // ONE query for all IDs — not one per ID const customers = await db.customers.findMany({ where: { id: { in: [...customerIds] } } }); // Must return in the same order as the input keys const map = new Map(customers.map(c => [c.id, c])); return customerIds.map(id => map.get(id) ?? new Error(`Customer ${id} not found`)); } ); // In resolver — DataLoader coalesces all calls within one tick into one batch const resolvers = { Order: { customer: (order, _args, { loaders }) => { return loaders.customer.load(order.customerId); // batched automatically }, }, };

📡 Subscriptions

Real-time event streams — when to use them and operational implications.

Good fit for subscriptions
  • Order status tracking (customer-facing)
  • Live dashboards and analytics counters
  • Collaborative editing presence indicators
  • Notification feeds and alert streams
  • IoT sensor data aggregation
Poor fit — use alternatives
  • Polling-based reports (use REST + polling)
  • Single-event webhooks (use REST webhooks)
  • High-frequency binary streams (use WebRTC or raw WS)
  • When WebSocket infra isn't production-ready
  • Serverless-only deployments (WS requires persistent connections)
TypeScript — Subscription resolver (Apollo)
import { withFilter } from 'graphql-subscriptions'; const resolvers = { Subscription: { orderStatusChanged: { // subscribe — returns an async iterator from your pub/sub system subscribe: withFilter( (_parent, _args, { pubsub }) => pubsub.asyncIterator('ORDER_STATUS_CHANGED'), // filter — only send events relevant to the subscriber (payload, variables) => payload.orderStatusChanged.orderId === variables.orderId ), }, }, }; // Publishing from your domain layer (e.g., in OrderService) await pubsub.publish('ORDER_STATUS_CHANGED', { orderStatusChanged: { orderId: order.id, newStatus: order.status, updatedAt: new Date().toISOString(), }, });
⚠️
Operational caution: Subscriptions require long-lived WebSocket connections. In horizontally-scaled deployments, your pub/sub system (Redis, Kafka, NATS) must route events to whichever instance holds the subscriber's connection. This is non-trivial infrastructure. Use Redis adapter for Apollo or evaluate managed options (Hasura, Grafbase) before rolling your own.

🖼️ Example: Backend-for-Frontend (BFF) Pattern

The most common GraphQL adoption pattern in enterprise — a GraphQL layer in front of existing REST or gRPC services, purpose-built for your product surfaces.

Rather than forcing each client (web, iOS, Android) to orchestrate calls to Order Service, Customer Service, and Inventory Service independently, a GraphQL BFF acts as an orchestration layer. Clients make one request; the BFF fans out to upstream services and assembles the response.

📱 clients Web / iOS / Android
🔷 GraphQL BFF Single endpoint
📦 upstream Order Service
+
👤 upstream Customer Service
+
🏷️ upstream Inventory Service
TypeScript — BFF resolvers calling REST upstream services
// DataSource wrapping each upstream service class OrderAPI extends RESTDataSource { baseURL = 'https://order-service.internal/'; async getById(id: string): Promise<Order> { return this.get(`orders/${id}`); } async listByCustomer(customerId: string): Promise<Order[]> { return this.get(`orders`, { params: { customerId } }); } } // Resolvers orchestrate across data sources — client shapes the result const resolvers = { Query: { me: async (_parent, _args, { dataSources, user }) => { return dataSources.customerAPI.getById(user.id); }, }, Customer: { // Lazy — only runs if client requests customer.orders orders: (customer, _args, { dataSources }) => dataSources.orderAPI.listByCustomer(customer.id), }, Order: { // Lazy — only runs if client requests order.inventoryStatus inventoryStatus: (order, _args, { dataSources }) => dataSources.inventoryAPI.getStatus(order.sku), }, };

🕸️ Example: GraphQL Federation

Federation allows multiple teams to own sub-graphs that compose into a unified supergraph — without a central BFF team becoming a bottleneck.

Each service publishes its own partial schema. A router (Apollo Router, Cosmo, or WunderGraph) composes them into a unified schema at query-planning time. Teams work independently; the contract is the federation spec.

GraphQL SDL — Subgraph A: Orders Service orders-subgraph/schema.graphql
# Orders team owns this subgraph extend schema @link(url: "https://specs.apollo.dev/federation/v2.3") type Order @key(fields: "id") { id: ID! status: OrderStatus! total: Float! placedAt: String! # Reference to Customer — resolved by Customers subgraph customer: Customer! } # Stub — orders knows customerId but not Customer fields type Customer @key(fields: "id", resolvable: false) { id: ID! } type Query { order(id: ID!): Order orders(filter: OrderFilter): [Order!]! }
GraphQL SDL — Subgraph B: Customers Service customers-subgraph/schema.graphql
# Customers team owns this subgraph extend schema @link(url: "https://specs.apollo.dev/federation/v2.3") type Customer @key(fields: "id") { id: ID! name: String! email: String! # Customers subgraph also extends Order to add customer-scoped fields orders: [Order!]! } # Reference resolver — receives { id } from Orders, returns full Customer # Router calls this to "join" the entities at query-plan time type Query { me: Customer customer(id: ID!): Customer }
ℹ️
Router options: Apollo Router (Rust, open-source core) is the most mature. WunderGraph Cosmo is a newer open-source alternative with a managed platform. Hot Chocolate (.NET) has first-class federation support. For .NET teams, Strawberry Shake + Hot Chocolate federation is the recommended stack.

🛡️ Security

GraphQL-specific attack surfaces and mandatory mitigations for production.

ThreatMitigationLibrary
Query depth attack — deeply nested query exhausts resolvers Enforce max depth (e.g., 10 levels) graphql-depth-limit, built-in to Hot Chocolate
Query complexity attack — expensive field combinations Assign costs per field; reject over budget graphql-cost-analysis, Apollo cost directives
Introspection disclosure — schema exposed to attackers Disable introspection in production; enable only in dev/staging Apollo Server: introspection: false
Unbounded results — query returns millions of rows Enforce max first/limit arguments at schema level Custom validation rule or schema directive
Alias flooding — duplicate fields with different aliases Count aliased fields in complexity scoring Custom validation or complexity library
Authorization bypass — field accessed without permission Field-level auth with @auth directive or middleware GraphQL Shield, Hot Chocolate's @authorize
Arbitrary operations (partner/public) Persisted queries — only pre-approved operation hashes allowed Apollo APQ, Relay persisted queries
TypeScript — Apollo Server security configuration
import { ApolloServer } from '@apollo/server'; import depthLimit from 'graphql-depth-limit'; import costAnalysis from 'graphql-cost-analysis'; const server = new ApolloServer({ typeDefs, resolvers, // Disable introspection in production introspection: process.env.NODE_ENV !== 'production', validationRules: [ // Reject queries deeper than 10 levels depthLimit(10), // Reject queries with cost score > 1000 costAnalysis({ maximumCost: 1000, defaultCost: 1, scalarCost: 1, objectCost: 2, listFactor: 10, }), ], // Body size limit — prevent large query strings bodyParserConfig: { limit: '50kb' }, });

🚀 Performance

DataLoader (mandatory)

Every resolver that loads a related entity must go through DataLoader. Without it, loading 100 orders and their customers issues 101 database queries. With DataLoader, it's 2. Non-negotiable in production.

Field-level tracing

Use Apollo Studio, Cosmo, or OpenTelemetry spans to see resolver execution time per field. Slow resolvers are immediately visible. Optimize before — not after — load testing.

Response caching

Use @cacheControl directives to annotate fields with cache TTLs. The server can return a Cache-Control header representing the minimum TTL across all resolved fields in the response.

Persisted queries

Send a hash instead of the full query string. Reduces request payload size (important on mobile). Enables GET-based requests which CDNs can cache. Security and performance benefit in one.

Defer and Stream (GraphQL 2022)

Use @defer on non-critical fields to return the primary payload immediately and stream deferred fields as they resolve. Reduces perceived latency for complex pages.

Query planning (Federation)

The router generates a query plan across subgraphs. Fetch subgraph data in parallel where possible. Avoid deeply nested cross-subgraph entity references that force sequential fetches.

Performance baseline checklist before going live: DataLoader on all relational fields ✓ · Query depth + complexity limits configured ✓ · Field-level tracing enabled ✓ · APQ configured ✓ · Load test with realistic query shapes (not synthetic single-field queries) ✓ · N+1 detector run in staging ✓

🔧 Tooling

CategoryToolNotes
Server (.NET) Hot Chocolate / Strawberry Shake First-class .NET GraphQL server. Annotation-based + SDL-first. Federation v2 support.
Server (Node) Apollo Server, Yoga Apollo for enterprise features + Studio. Yoga for lightweight/edge deployments.
Client (React) Apollo Client, urql, Relay Apollo for full ecosystem. urql for lightweight. Relay for Relay-spec pagination + Facebook scale.
Client (.NET) Strawberry Shake Generated typed client from schema. Integrates with Hot Chocolate ecosystem.
Code generation GraphQL Code Generator Generates TypeScript types, React hooks, and resolvers from schema + operations. Run in CI.
Schema management GraphQL Inspector Breaking change detection in CI. Diff schemas. Validate coverage.
IDE tooling GraphiQL, Apollo Sandbox In-browser query explorers. Apollo Sandbox works without local server.
Federation router Apollo Router, Cosmo Apollo Router (Rust): mature, open core. Cosmo: fully open-source alternative with managed option.
Schema registry Apollo GraphOS, Cosmo Platform Schema versioning, breaking change gating, subgraph composition validation.
Observability Apollo Studio, OpenTelemetry Field-level usage metrics, error rates, resolver latency heatmaps.
Testing jest + graphql-tag, Testcontainers Unit-test resolvers in isolation. Integration-test against a real schema + DB.

🔗 Reference Links

Quick decision matrix

ScenarioRecommendationRationale
Multiple clients, different payload needsGraphQLCore value prop — clients shape their own queries
Aggregating 3+ microservices for a product surfaceGraphQL FederationAvoids bespoke BFF per surface; teams own sub-graphs
Simple internal CRUD microserviceREST or gRPCZero over/under-fetching problem; added complexity not justified
Public API with unknown consumersREST preferredREST has wider tooling, simpler caching, lower learning curve
Real-time + query on the same domainGraphQL + SubscriptionsUnified protocol; no separate WS service needed
Internal service, team owns both sidesgRPCStrictly typed contract, 3–5× faster, no browser concern
High CDN cache dependencyREST or APQHTTP GET caching native to REST; APQ is a workaround
File uploads are a primary use caseREST + presigned URLsGraphQL has no standard multipart support
📌
Bottom line for ARB presentations: GraphQL solves a specific problem — multiple clients with divergent data needs consuming overlapping domain services. When that problem exists, it solves it elegantly. When it doesn't, GraphQL adds operational complexity for no customer-facing benefit. The decision should follow the problem, not the hype.