V1
Back to handbooks index
Architecture Decision Record

Modular Monolith
& DevOps Architecture

A complete architectural blueprint for an Angular + .NET Core modular monolith — covering application structure, 20-repo strategy, GitOps, CI/CD pipeline, Helm, ArgoCD, Kubernetes, and production observability. Every decision is explained with its rationale.

Angular Frontend .NET Core Backend GitHub 20 Repos Helm + ArgoCD Kubernetes GitOps

🧱
Modular Monolith DesignWhy not microservices — and why not a big ball of mud

◆ WHY MODULAR MONOLITH

A modular monolith delivers microservice-level domain separation without the operational overhead of inter-service networking, distributed tracing, and service mesh. For most teams up to ~30 engineers, it is the pragmatic default. It deploys as one unit, shares one database (with schema-level isolation per module), and can be sliced into microservices later along well-established seams — if that day ever comes.

✅ Modular Monolith Wins
  • One deploy artifact — simpler CI/CD
  • In-process calls — no network latency between modules
  • Shared transaction boundary — ACID across modules
  • Easier debugging — single process, single log stream
  • Lower infra cost — fewer containers, no service mesh
📋 Key Discipline Required
  • Enforce module boundaries via compiler — no cross-module internal references
  • Each module owns its own database schema
  • Cross-module communication via interfaces, not concrete classes
  • Architecture tests (ArchUnitNET) fail the build on violations
  • Module = vertical slice: API → Application → Domain → Infra

📦
Module BoundariesDomain-driven slicing of the .NET Core backend

Each module is a separate C# project within the same solution. Modules communicate only through published interfaces or an internal event bus — never by referencing each other's internals.

🔐 Identity Module
Auth, users, roles, permissions. Exposes ICurrentUser interface to other modules.
📦 Catalog Module
Products, categories, pricing. Own DB schema. Events: ProductUpdated.
🛒 Orders Module
Order lifecycle, cart, checkout. Depends on Catalog via interface only.
💳 Payments Module
Payment processing, gateway abstraction, refunds. Raises domain events.
📬 Notifications Module
Email, SMS, push. Consumes events from Orders/Payments. Zero upstream deps.
📊 Reporting Module
Read-only projections. Can query its own read DB built from events.
⚠️
The shared kernel: Keep a Shared project with only value objects, interfaces, and contracts that multiple modules genuinely need. No business logic lives there. It is a contract library, not a dumping ground.

🌐
Angular FrontendStandalone SPA aligned to backend module boundaries

The Angular frontend follows a feature-module structure that mirrors the backend's domain modules. This is not cosmetic — it means each Angular feature team owns its slice end-to-end.

📁 Frontend Structure
  • features/ — one folder per backend module
  • core/ — auth, HTTP interceptors, guards
  • shared/ — UI components, design system
  • shell/ — app router, layout, nav
⚡ Build & Deploy
  • Lazy-loaded feature modules — Angular standalone APIs
  • Built to static assets — served via Nginx container
  • Runtime config via /assets/config.json — no rebuild per env
  • CSP headers enforced at Nginx level

📁
20-Repo StrategyHow to organize 20 GitHub repos without chaos

◆ WHY NOT A MONOREPO

With 20 repos and a modular monolith, a true monorepo (Nx/Turborepo) introduces significant tooling overhead for non-JS stacks. A polyrepo with strong conventions is simpler to reason about, grants cleaner CODEOWNERS boundaries, and each repo maps to a clear deployment artifact or library.

Repo Categories

app-frontend
Angular SPA
app-backend
.NET Core solution (all modules)
lib-design-system
Angular component lib
lib-api-client
Generated API client (OpenAPI)
lib-shared-kernel
Shared C# contracts / NuGet
infra-terraform
Cloud infra (IaC)
infra-kubernetes
Base K8s manifests
gitops-config
GitOps repo — Helm values
svc-notification
Standalone sidecar worker
svc-pdf-renderer
Isolated utility service
svc-job-scheduler
Background job runner
svc-data-import
ETL / data pipeline service
tools-cli
Internal developer tooling
tools-seed-data
Dev/staging data seeding
test-e2e
Playwright E2E test suite
docs-architecture
ADRs, runbooks, architecture docs
ops-monitoring
Grafana dashboards, alert rules
ops-runbooks
Incident runbooks (auto-linked)
platform-gateway
API Gateway config / NGINX
platform-shared-ci
Reusable GitHub Actions workflows
💡
platform-shared-ci is the glue. It contains reusable GitHub Actions workflow templates (build, test, scan, push, deploy) that every other repo calls via uses: org/platform-shared-ci/.github/workflows/build.yml@main. This means updating the pipeline in one place propagates to all 20 repos instantly.

🌿
Branch StrategyEnvironment-aligned branching — Gitflow with trunk evolution

◆ DECISION: MODIFIED GITFLOW

Pure trunk-based development works for teams with very high test confidence and feature flags. For a 20-repo modular monolith with a QA cycle and staging environment, modified Gitflow gives clearer environment promotion and safer hotfix paths. developreleasemain maps directly to dev → staging → production.

main ──── always deployable · triggers production deploy · protected · requires 2 approvals + passing CI │ ├─ release/1.4.0 ──── short-lived · from develop · only bug fixes allowed · merges into main + developdevelop ──── integration branch · auto-deploys to dev environment · 1 approval required │ ├─ feature/JIRA-123-order-api ──── from develop · merge via PR · short-lived (< 2 days ideal) ├─ feature/JIRA-456-catalog-search ──── squash merge preferred · deletes after merge ├─ chore/update-dotnet-sdk └─ fix/JIRA-789-null-ref-payment hotfix/JIRA-999-prod-incident ──── from main only · fast-tracked CI · merges into main + develop
BranchMaps to EnvironmentDeploy TriggerProtection Rules
mainProductionManual approval gate after merge2 reviewers, CI must pass, no force-push
release/*StagingAuto on push to release branch1 reviewer, CI must pass
developDev / IntegrationAuto on merge1 reviewer, CI must pass
feature/*None (PR preview optional)CI only, no deployCI must pass before merge
hotfix/*Production fast-trackManual gate, expedited1 reviewer + senior lead

🗃️
GitOps Repo DecisionShould you have a dedicated gitops-config repo? Yes. Here's why.

◆ DECISION: DEDICATED gitops-config REPO — SEPARATE FROM APPLICATION CODE

The gitops-config repo is the single source of truth for what is deployed where. It contains Helm values files per environment, image tag references, and resource overrides. It does NOT contain application code, Dockerfiles, or business logic.

✅ Why Separate GitOps Repo
  • Clean audit trail — every production change is a git commit
  • ArgoCD watches only this repo — reduces blast radius of app repo access
  • Environment promotion = PR + review, not a kubectl command
  • Different access rights: devs push to app repos; only CD pipeline + SRE push to gitops repo
  • No secrets ever live here — only references (Vault paths, Sealed Secrets)
📁 gitops-config Structure
gitops-config/
 ├─ apps/
 │  ├─ app-frontend/
 │    ├─ dev.values.yaml
 │    ├─ staging.values.yaml
 │    └─ prod.values.yaml
 │  └─ app-backend/ …
 ├─ clusters/
 │  ├─ dev/ staging/ prod/
 └─ argocd-apps/
    └─ ApplicationSet manifests
ℹ️
The promotion flow: CI builds an image and tags it sha-abc123 → CI opens an automated PR to gitops-config updating dev.values.yaml → PR auto-merges for dev, requires review for staging/prod → ArgoCD detects the diff → reconciles the cluster. No human types kubectl commands.

⚙️
Pipeline Overviewlocal dev → pre-commit → CI → image push → GitOps update → ArgoCD sync → K8s → alerts

💻
Stage 0
Local Dev
Docker Compose
🔒
Stage 1
Pre-commit
Husky + Hooks
— git push →
🏗️
Stage 2
CI Build
GitHub Actions
🧪
Stage 3
Test + Scan
xUnit / Sonar
📤
Stage 4
Image Push
GHCR / ECR
🔄
Stage 5
GitOps PR
gitops-config
🚀
Stage 6
ArgoCD Sync
ArgoCD
☸️
Stage 7
Kubernetes
EKS / AKS / GKE
📊
Stage 8
Observe
Grafana / PagerDuty

💻
Local DevelopmentFast feedback loop without a Kubernetes cluster on your laptop

◆ TOOL: Docker Compose

Docker Compose spins up the full stack locally — .NET backend, Angular dev server (or built Nginx container), PostgreSQL, Redis, and any sidecar services. Developers should be able to docker compose up and have a working environment in under 3 minutes.

🐳 What Compose Runs Locally
  • app-backend — hot-reload with dotnet watch
  • app-frontend — Angular dev server with proxy to backend
  • postgres — schema-per-module, seeded with test data
  • redis — distributed cache
  • mailhog — local email catcher
  • seq — local structured log viewer
📋 Developer Contract
  • One .env.local file, never committed
  • tools-seed-data repo provides realistic dev data
  • DB migrations run automatically on startup
  • No need for cloud credentials for basic development
  • Feature flags default to "on" in local env
💡
Starter resource: See the DevOps Playbook at vivek-doshi.github.io/devops-playbook/ for ready-to-use Docker Compose configs and local dev setup scripts aligned with this architecture.

🔒
Pre-commit HooksCatch problems before they reach CI — not after

◆ WHY PRE-COMMIT

CI minutes are expensive and slow. A pre-commit hook that runs in 10 seconds catches 80% of the obvious issues that would otherwise waste a 4-minute CI run. The golden rule: if it can run locally fast, it should run locally first.

HookToolChecks
Formatdotnet format --verify-no-changes · ng lint + PrettierCode style, whitespace, import order
Secrets scandetect-secrets / gitleaksAPI keys, connection strings, tokens in diff
Commit messagecommitlint + conventional commitsfeat(auth): add MFA support [JIRA-123]
Unit test smokedotnet test --filter Category=Unit (fast subset)Run only tests touching changed projects
File size guardCustom shell checkReject files > 1MB committed by accident
⚠️
Husky for Node / Angular repos, .git hooks for .NET repos. Keep pre-commit hooks fast — under 30 seconds total. Slow hooks get bypassed with --no-verify. If a check needs 2+ minutes, it belongs in CI, not pre-commit.

🏗️
CI: Build + Test + ScanGitHub Actions — everything happens here before an image is produced

◆ TOOL: GitHub Actions

Source code is already in GitHub. GitHub Actions is the zero-friction choice — no separate CI server to maintain, native secret management, reusable workflows, and tight integration with PRs, branch protection, and environments. The platform-shared-ci repo provides reusable workflow templates called by all 20 repos.

CI Pipeline Stages (GitHub Actions)

1. Trigger
Every pull request to develop, release/*, or main. Also on push to develop (for auto-deploy to dev). Feature branch pushes run CI without deploy.
2. Restore & Build
Backend: dotnet restore + dotnet build --configuration Release /warnaserror — warnings are errors. Frontend: npm ci + ng build --configuration=production. Both run in parallel jobs.
3. Unit Tests
dotnet test with XPlat Code Coverage. Angular with ng test --watch=false --code-coverage. Coverage threshold enforced: fail CI below 80% on business logic projects. Test results posted as PR comment.
4. Integration Tests
Testcontainers spins up real PostgreSQL + Redis in the CI runner. Tests tagged [Integration] run against a real database. Runs on PRs to release/* and main — skipped on feature-branch PRs to keep feedback fast.
5. Static Analysis
SonarCloud scans .NET + Angular in parallel. StyleCop + Roslynator violations fail the build. dotnet format --verify-no-changes enforced. Angular ESLint checked. Security hotspots block merge on main.
6. Dependency Scan
Dependabot alerts + OWASP Dependency-Check or Snyk scan NuGet and npm packages. Critical CVEs block the pipeline. Results uploaded to GitHub Security tab.
7. Architecture Tests
ArchUnitNET tests verify module boundaries: no cross-module internal references, no domain layer depending on infrastructure, etc. These are ordinary xUnit tests — they just happen to assert structural constraints.
8. Container Build
Docker multi-stage build. Only runs if all prior steps pass. Uses BuildKit layer caching. Image NOT pushed yet — just built and tagged locally to verify it builds successfully. SBOM generated.
Target CI runtime: Feature-branch PRs should complete in under 8 minutes. Merges to main (including integration tests) should complete in under 15 minutes. Use matrix builds and parallel jobs aggressively.

📤
Image PushWhen, where, and how images are published

◆ DECISION: GHCR (GitHub Container Registry)

Since source is in GitHub, GitHub Container Registry (GHCR) is the natural choice — no extra credentials, native GITHUB_TOKEN auth, image visibility tied to repo visibility, and no additional monthly cost on GitHub Enterprise. For regulated environments needing image scanning at registry level, AWS ECR is an alternative with Clair/Inspector integration.

🏷️ Image Tagging Strategy
  • sha-<7-char-git-sha> — primary immutable tag, always
  • develop — latest develop build (mutable)
  • staging — latest staging-ready build
  • v1.4.2 — on git tag, semantic version
  • Never push :latest — it's a footgun in production
🔍 Image Security Before Push
  • Trivy scan on image before push — fail on CRITICAL CVEs
  • Multi-stage Dockerfile — final image is distroless or Alpine
  • Run as non-root user inside container
  • Image SBOM (Software Bill of Materials) attached to image
  • Image signed with cosign — verified at deploy time by K8s admission
🔐
Image signing with cosign: After push, the CI pipeline signs the image using Sigstore's cosign with a keyless flow (OIDC from GitHub Actions). A Kubernetes admission controller (e.g., Kyverno policy) can then enforce that only signed images from your registry are ever scheduled — blocking accidental or malicious image substitution.

🔄
GitOps UpdateThe CI pipeline opens a PR against gitops-config — it never directly deploys

◆ THE SEPARATION OF PUSH AND DEPLOY

The application CI pipeline's job ends when the image is pushed. It does NOT run kubectl apply or helm upgrade. Instead, it opens a PR (or auto-merges for dev) to the gitops-config repo, updating the image tag in the relevant values file. This separation is intentional: it decouples what was built from what is deployed, and makes every environment change auditable in git.

Dev branch
CI auto-merges the image tag update PR to gitops-config/apps/app-backend/dev.values.yaml. No human action needed. ArgoCD detects the change and syncs within ~3 minutes.
Staging branch
CI opens a PR to update staging.values.yaml. Requires 1 review from a team lead. Merge triggers ArgoCD sync to staging cluster.
Production
PR to prod.values.yaml requires 2 senior approvals + a deployment window comment. Merge is the approval record. ArgoCD sync follows — with manual sync gate option for major releases.
Rollback
Rollback is a git revert PR on gitops-config — restoring the previous image tag. ArgoCD reconciles. No custom rollback scripts needed. Full audit trail preserved.

🚀
ArgoCD — GitOps Continuous DeliveryWhy ArgoCD over Flux or push-based CD

◆ WHY ARGOCD

ArgoCD is the industry standard for Kubernetes GitOps. Its UI makes drift visible at a glance. It is pull-based — the cluster pulls desired state from git rather than CI pushing manifests in. This means no CI system ever needs kubectl credentials to the production cluster, which is a major security win. Compared to Flux, ArgoCD's UI and ApplicationSet patterns are better suited for 20-repo organizations.

ArgoCD Setup Decisions

🗂️ ApplicationSet Pattern
Use ApplicationSet with a Git generator to auto-create ArgoCD Application objects from directories in gitops-config/apps/. Adding a new service = adding a folder. No manual ArgoCD configuration needed per service.
🔄 Sync Policy
Dev cluster: auto-sync with self-heal — any drift is corrected automatically. Staging: auto-sync, no self-heal (manual drift inspection required). Prod: manual sync — a human presses Sync after reviewing the diff in ArgoCD UI.
🔑 Secrets Management
Sealed Secrets or External Secrets Operator (pulling from AWS Secrets Manager / Azure Key Vault). No plaintext secrets in gitops-config ever. ArgoCD itself uses OIDC (GitHub/Okta) for user auth — RBAC aligned to GitHub teams.
🏥 Health Checks
ArgoCD understands Kubernetes Deployment rollout health. It will show an app as "Degraded" if a new pod crashes. Custom health checks added for CRDs (e.g., database migration jobs). Notifications to Slack on sync failure or degraded status.
ℹ️
ArgoCD lives inside the cluster it manages (same cluster or management cluster pattern). For production isolation, use a dedicated management cluster running ArgoCD that manages spoke clusters. This prevents the "bootstrap problem" where ArgoCD going down takes prod deployment with it.

☸️
KubernetesCluster design, namespace strategy, and workload configuration

🏗️ Cluster Strategy
  • Dev cluster — shared, smaller nodes, preemptible/spot instances
  • Staging cluster — mirrors prod sizing, same region, used for load tests
  • Prod cluster — dedicated, multi-AZ, managed (EKS/AKS/GKE)
  • Separate cluster per environment — not namespace-only separation
📁 Namespace Design
  • apps — all application workloads
  • platform — ArgoCD, cert-manager, ingress-nginx
  • monitoring — Prometheus, Grafana, Loki
  • secrets — External Secrets Operator

Standard Workload Configuration per Service

ConcernDecisionReason
Health checksLiveness + readiness + startup probesStartup probe prevents premature kill during .NET startup
Resource limitsCPU request ≠ limit, Memory request = limitCPU bursting OK; OOM kill is preferable to slow response
HPAHorizontal Pod Autoscaler on CPU + custom metric (request rate)Scale out under load before hitting CPU ceiling
PodDisruptionBudgetminAvailable: 1 on prodPrevents cluster upgrades taking down all pods
Anti-affinityPrefer spread across nodes and AZsSingle node failure shouldn't take down service
Image pull policyIfNotPresent with immutable tagsNever pull :latest — non-deterministic
Network policyDefault deny, explicit allow per serviceModule A cannot talk to Module B's DB pod

📊
Observability & AlertsMetrics → Logs → Traces → Alerts — the four pillars

📈
Prometheus + Grafana
Metrics collection and dashboarding. .NET exposes metrics via OpenTelemetry.Exporter.Prometheus. Angular performance via RUM. Grafana dashboards live in ops-monitoring repo — provisioned as code.
Metrics
📝
Loki + Grafana
Log aggregation. Serilog outputs structured JSON → Loki. Logs correlated to traces via TraceId. Grafana queries both metrics and logs in the same dashboard.
Logs
🔍
Tempo (or Jaeger)
Distributed tracing. OpenTelemetry SDK instruments .NET and Angular. Traces show the full request path across modules and async jobs. Essential for debugging latency spikes.
Traces
🚨
Alertmanager + PagerDuty
Prometheus Alertmanager routes alerts. Severity P1/P2 → PagerDuty on-call. P3/P4 → Slack channel. Alert rules live in ops-monitoring repo and are applied via Helm chart.
Alerts

What to Alert On (Signals, Not Symptoms)

SignalThresholdSeverity
HTTP 5xx error rate> 1% over 5 min windowP2 — Slack + PagerDuty
HTTP p99 latency> 2s sustained 10 minP2 — Slack
Pod crash loop3+ restarts in 15 minP1 — PagerDuty immediate
ArgoCD sync failedAny failure on prodP2 — Slack
Certificate expiry< 30 days remainingP3 — Slack warning
DB connection pool saturation> 80% pool usedP2 — Slack
Disk usage on PVC> 80% of claimP3 — Slack

Why HelmThe definitive answer to "why not just raw YAML?"

◆ WHY HELM — NOT KUSTOMIZE, NOT RAW YAML

With 20 repos and 3 environments (dev, staging, prod), raw Kubernetes YAML means maintaining 3× duplicated manifests per service — 60+ manifest sets. Helm solves environment parameterization: one chart template, three values files, one source of truth.

✅ Helm Gives You
  • Templating — one chart, values files per environment. No YAML duplication.
  • Release managementhelm history shows every deploy with rollback support
  • Dependency management — subchart dependencies versioned like npm packages
  • ArgoCD native — ArgoCD understands Helm natively, renders values before applying
  • Community charts — PostgreSQL, Redis, Ingress-NGINX, Cert-Manager all ship as production-grade Helm charts
📁 Helm Chart Structure
charts/app-backend/
 ├─ Chart.yaml   — name, version, appVersion
 ├─ values.yaml  — sensible defaults
 └─ templates/
    ├─ deployment.yaml
    ├─ service.yaml
    ├─ ingress.yaml
    ├─ hpa.yaml
    ├─ configmap.yaml
    └─ pdb.yaml
⚠️
Helm chart location decision: Application-specific Helm charts live in the application's own repo (e.g., app-backend/charts/). The gitops-config repo references these charts by version and overrides only environment-specific values. This keeps chart logic close to the code it deploys, while keeping environment config separate.
◆ KUSTOMIZE vs HELM

Kustomize is a valid alternative (and is ArgoCD-native too). It is better when you have simple overlays and no need for parameterized logic. Helm wins when you have complex conditionals (HPA only in prod, different ingress annotations per env, feature flag toggles). For this architecture with 3 environments and 20 services, Helm's templating capability justifies the learning curve.

🔧
Tool Decisions SummaryEvery choice with its rationale — no cargo-culting

StageTool ChosenWhy This, Not Alternatives
Source control GitHub Stated requirement. GitHub Actions native CI, GHCR, branch protection, CODEOWNERS all in one platform.
CI/CD engine GitHub Actions Source is GitHub — zero friction. Reusable workflows (platform-shared-ci) serve as DRY CI templates. No separate Jenkins/GitLab to maintain.
Container registry GHCR Native GitHub auth, no extra service. ECR if AWS-native is required with Inspector scanning.
Image security scan Trivy Open source, fast, scans OS packages + language deps, integrates with GitHub Actions via action, widely adopted.
Static analysis SonarCloud + Roslynator SonarCloud for security hotspots + code smells + PR decoration. Roslynator for .NET-specific patterns at build time. Both are complementary.
GitOps controller ArgoCD Best UI for drift visibility, ApplicationSet for multi-app management, mature RBAC, active community. Flux is valid alternative but ArgoCD UI wins for teams new to GitOps.
K8s packaging Helm Environment parameterization, rich community charts, ArgoCD native. Kustomize for simpler single-environment projects.
Secrets External Secrets Operator Pulls secrets from AWS Secrets Manager / Azure Key Vault at runtime. Sealed Secrets if no cloud secret store. Never store secrets in gitops-config.
Metrics Prometheus + Grafana Industry standard. OpenTelemetry SDK in .NET exports to Prometheus. Grafana dashboards as code in ops-monitoring repo.
Logs Loki + Grafana Same Grafana instance queries both metrics and logs. Loki is cost-effective vs Elasticsearch for structured log querying.
Tracing Tempo Grafana native, same observability stack. OpenTelemetry traces from .NET → Tempo. Jaeger is alternative.
Alerting Alertmanager → PagerDuty + Slack Prometheus Alertmanager routes by severity. PagerDuty for P1/P2 on-call escalation. Slack for P3/P4 visibility.
Local dev Docker Compose No Kubernetes overhead on developer laptops. Full stack in one command. Tilt.dev optional for hot-reload with K8s.
E2E testing Playwright Cross-browser, fast, great Angular support, component testing mode. Lives in test-e2e repo, runs against deployed staging.
Image signing cosign (Sigstore) Keyless signing via GitHub OIDC. Supply chain security without managing private keys. Verified at admission by Kyverno.

🌍
Environment StrategyDev, Staging, Production — and how they differ

Environment Cluster Deploy Trigger ArgoCD Sync Data Purpose
Local Docker Compose Developer runs manually N/A Seeded synthetic Fast iteration, debugging
Dev Shared K8s (small) Merge to develop Auto + self-heal Seeded synthetic Integration smoke tests
Staging Dedicated K8s (prod-size) Release branch push + PR approval Auto, no self-heal Anonymized prod copy QA, load testing, UAT
Production Dedicated K8s (HA, multi-AZ) Merge to main + manual gitops PR Manual sync only Live production data Serve real users
🚫
Production access: No developer has kubectl exec access to production pods. All production changes go through the gitops-config PR process. Break-glass emergency access exists for SRE leads only, with session recording and automatic ticket creation. Every kubectl command in production is a post-incident finding waiting to happen.