Documents/deliverables/4.4 — Trade-Off Log

4.4 — Trade-Off Log

Deliverable 4.4 — Trade-Off Log

Requirement: What are you intentionally not optimizing? What technical debt? What to revisit in 6 months?
Source: Submission.md, Constraints Analysis.md, Tech Stack Analysis.md, Strategy.md


1. Intentionally Not Optimizing

# What We Chose Trade-Off Accepted Over (Alternative) Why This Trade-Off
T1 Payment stays in monolith Tech debt: ACL adapter, legacy maintenance Migrate Payment early Constraint (frozen Phase 1) + highest risk (PCI, financial). ACL provides clean bridge. Cost of delay: ACL maintenance ~0.5 MM/month
T2 Azure Container Apps Less control over networking, pod config Kubernetes (AKS) 5 engineers can't manage K8s cluster. Container Apps = managed, auto-scale, zero ops. Cost: less fine-grained control when debugging network issues
T3 Azure SQL everywhere Not polyglot-optimized per service Cosmos DB for Comms, Redis for cache, etc. One DB technology = one skill to maintain. Polyglot = multiple operational burdens for 5 eng. Cost: Comms messages could be faster with document store
T4 Incremental React rewrite Legacy Payment UI in iframe. UX inconsistency on 1-2 modules Full React rewrite for all modules 5 engineers can't rewrite all frontend + all backend simultaneously. Cost: Payment page looks "old" next to new modules
T5 Single region (Active-Passive) Higher latency for EU/US users (~120-200ms) Multi-region active-active 40K users served well with SEA primary + CDN. Active-active = double infra complexity. Cost: ~120ms extra for US users (acceptable for enterprise app)
T6 Contract tests over heavy E2E Fewer full-flow automated tests Comprehensive E2E test suite (Playwright) E2E tests = slow, flaky, high maintenance. Contract tests verify boundaries efficiently. Cost: some integration gaps only caught in staging/canary
T7 Shared DB views during CDC transition Temporary coupling via shared DB views Full data decomposition from Day 1 Per-service DBs come at each service's go-live, but during transition CDC bridges the gap. Cost: short coupling window per module

2. Technical Debt Accepted

# Debt Why Accepted Severity Cost of Delay
D1 Legacy Payment UI in iframe (no React rewrite) Works, zero risk, user doesn't notice difference for basic payment flows Low UX inconsistency. Fix when Payment modernized
D2 Comms templates hardcoded initially Quick migration. Template engine comes Phase 2 complete Low Duplicate templates, manual updates. ~1 week to fix later
D3 Manual IaC for some staging resources Automation ROI not worth for one-time staging setup Very Low Manual drift possible, but staging only
D4 Limited load testing before Phase 3 Load testing meaningful only when services are stable and connected Medium Performance issues discovered later. Mitigated by canary release (gradual traffic)
D5 Reporting queries not fully optimized Materialized views from events = good enough, not optimal Medium Slow queries for complex reports. Fix with production query patterns
D6 No service mesh (Istio/Linkerd) 5 services + YARP + Polly handles current needs Low If services grow > 15, inter-service communication gets complex. Add mesh then
D7 Event schema not fully governed Schema Registry setup Phase 0, but enforcement only from Phase 2 Medium Schema drift between services. Mitigated by contract tests (Pact)

Debt Priority Matrix

                HIGH COST             LOW COST
                TO FIX LATER          TO FIX LATER
                ──────────            ─────────────
CAUSES ISSUES   │ D5: Report queries  │ D2: Templates
SOON            │ D7: Schema govern.  │ D3: Manual IaC
                │                     │
NO IMMEDIATE    │ D1: Payment iframe  │ D6: No service mesh
ISSUES          │ D4: Load testing    │
                
Fix order: D7 → D5 → D4 → D2 → D1 → D3 → D6

3. Revisit in 6 Months

# Item Current State Revisit Question Trigger
R1 Payment modernization Frozen. ACL bridge only Start migration planning? Extract to .NET 8 service? All other services stable + team comfortable + PCI review done
R2 AI multiplier accuracy 2x projected Was 2x realistic? Measure actual vs projected velocity Monthly AI metrics dashboard shows actual output measured
R3 Database decomposition Per-service DBs for all 5 services All DBs truly independent? Any lingering shared views? Check CDC still running anywhere → should be decommissioned
R4 React coverage 3-4 modules in React 18 Which legacy pages still not migrated? Need more frontend? User feedback on UX inconsistency between new/old modules
R5 Multi-region active-active Active-passive (SEA primary, AU failover) User growth justifies US/EU region? SLA requires <100ms? Monitor: latency by region, user distribution changes
R6 Event Sourcing adoption Events captured but not sourced Event sourcing for high-value aggregates (Booking)? Need audit trail? Temporal queries? Business requests analytics
R7 Service Bus → Kafka Azure Service Bus (managed) Volume exceeds Service Bus limits? Need streaming? > 10K events/second sustained (unlikely at 40K users)
R8 Container Apps → AKS Container Apps (serverless) Need more control? Team grown? Service count > 15? Platform limitations hit, team size > 10 engineers

6-Month Review Checklist

Month 7 Review Meeting (after Phase 2 complete):
  □ All 5 services live and stable? (metrics: error rate, latency, SLA)
  □ AI 2x multiplier achieved? (measure: actual MM delivered vs projected)
  □ Payment ACL reliable? (measure: ACL error rate, circuit breaker opens)
  □ Team health? (burnout indicators, retention risk)
  □ Technical debt acceptable? (list above, reassess severity)
  □ Scope for next 6 months? (Payment migration, ML features, multi-region)

Month 9 Handoff Document should include:
  □ What was deferred and why
  □ Ready-to-execute plan for Payment modernization
  □ Capacity needed for Phase 4 (post-9-months)

4. Trade-Off Philosophy

Core principle: OPTIMIZE FOR DELIVERY UNDER CONSTRAINTS

With 5 engineers + 9 months:
  ✅ Choose "good enough" over "perfect"
  ✅ Choose managed services over self-hosted (less ops)
  ✅ Choose one technology over polyglot (less expertise spread)
  ✅ Choose incremental over complete (ship value early)
  ✅ Choose deferral over scope creep (say no explicitly)

The discipline is NOT doing things:
  "What you don't do defines you as much as what you do."
  Every deferred item = capacity freed for what matters now.