Documents/overview/Submission (≤6 Pages)

Submission (≤6 Pages)

Technical Assessment — Legacy Platform Modernization

Role: Technical Lead — PhoenixDX Vietnam Hub Product A: Enterprise travel, event & operations platform | ~40,000 global users Scope: Legacy .NET monolith → .NET 8 microservices | 5 engineers | 9 months | Zero downtime Supporting docs: Full analysis repository available at the companion website


1. Target Architecture Overview

                    Clients (React 18 + Legacy UI)
                                │
                ┌───────────────┴───────────────┐
                │      API Gateway (YARP)        │
                │  Strangler Fig weighted routing │
                │  Auth │ Rate Limit │ Trace ID   │
                └──┬──────┬──────┬──────┬──────┬─┘
                   │      │      │      │      │
              ┌────┘   ┌──┘   ┌──┘   ┌──┘   ┌──┘
              ▼        ▼      ▼      ▼      ▼
           Travel   Event  Work-  Comms  Reporting
            Svc      Svc   force   Svc   Svc(CQRS)
           .NET8    .NET8   Svc   .NET8   .NET8
              │        │   .NET8    │      │
           [DB]     [DB]    │    [DB]     │
                           [DB]    ┌──────┘
                                   │
  ┌────────────────────────┐  ┌────┴──────────┐
  │ Legacy Monolith        │  │ Reporting DB  │
  │ Payment (frozen Ph.1)  │  │ (CDC replicas)│
  │ ◄── ACL bridge ──────  │  └───────────────┘
  │ Monolith DB ──CDC──────┼──────────┘
  └────────────────────────┘
  ┌────────────────────────────────────────────┐
  │  Azure Service Bus (Event-Driven Backbone) │
  └────────────────────────────────────────────┘
  ┌────────────────────────────────────────────┐
  │  OpenTelemetry + Serilog │ Bicep (IaC)     │
  │  Azure Container Apps │ GitHub Actions CI   │
  └────────────────────────────────────────────┘

Service Boundaries — 6 bounded contexts, 1:1 mapping to DDD domains:

Service Owns Communication
Travel Booking bookings, itineraries, suppliers Sync REST + Payment ACL + async events
Event Management events, venues, attendees Sync REST + Payment ACL + async events
Workforce staff, allocations, shifts Subscribes to travel/event events
Communications notifications, templates Subscribes to ALL domain events
Reporting (CQRS) read models, dashboards CDC from all DBs, event projections
Payment (legacy) payments, invoices Stays in monolith — ACL bridge only

Communication rules: (1) Client → sync REST via YARP Gateway. (2) State changes → async event to Service Bus. (3) No cross-service direct DB access — ever.


2. Migration Strategy

4-Phase Timeline

M1       M2       M3       M4       M5       M6       M7       M8       M9
├────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┤
Phase 0 ██  AI Foundation + Infra, Comms pilot (staging)
Phase 1      ████████████████████  Travel(M3) + Event(M4) go-live
Phase 2                            ████████████████████  Workforce(M6)
                                                         Comms+Report(M7)
Phase 3                                                          ████████
                                                           Harden: perf, DR
Phase Duration Key Deliverables Go-Live
0: AI Foundation M1 AI toolchain, CI/CD, IaC (Bicep), YARP Gateway, Comms pilot
1: Core M2–4 Travel + Event extracted, Payment ACL, per-service DBs, CDC Travel (M3), Event (M4)
2: Scale M5–7 Workforce, Comms (prod), Reporting (CQRS), React 18 Workforce (M6), Comms+Report (M7)
3: Harden M8–9 Load testing (40K users), security audit, DR validation All 5 hardened

Capacity: 5 eng × 9 mo = 45 raw MM. Phase-by-phase with variable AI multiplier: P0(2.0, ×1.0) + P1(18.0, ×2.0) + P2(19.0, ×2.0) + P3(6.5, ×1.0) ≈ ~44 effective MM.

Zero-Downtime Strategy

Strangler Fig + YARP: Route traffic by URL path — migrate one module at a time, legacy serves unmigrated routes. Per-module cutover: Shadow mode (compare responses) → Canary (5%→25%→50%→100% over 7–11 days) → Full cutover. Auto-rollback if error rate > 0.5%. Rollback = YARP weight change (< 30 seconds).

Backward Compatibility

Both systems run simultaneously. New services call legacy Payment via ACL (adapter pattern). CDC keeps data in sync — no dual-write. Event schemas versioned (v1.0+). React shell loads new modules alongside legacy UI. When Payment is eventually modernized → swap ACL target, zero changes to consuming services.


3. Failure Modeling

# Scenario L / I Mitigation
F1 CDC data inconsistency — new DB diverges from legacy during cutover M / H Checksum verification every 6h. Dual-read validation. Auto-pause on mismatch. 7-day parallel soak
F2 Legacy Payment outage blocks new services via ACL M / H Circuit breaker (Polly): fail fast after 3 retries. Queue in Service Bus. Graceful degradation: "pending payment"
F3 AI-generated code has business logic errors (e.g., pricing) H / H Human review mandatory for business logic. Contract tests (Pact) vs legacy. Shadow+Compare before traffic switch. Payment: zero AI-only merge
F4 Cascading failure during canary — timeout causes retry storm L / H Auto-rollback (error > 0.5%). Bulkhead isolation. Rate limiting at Gateway. Kill switch → legacy in < 30s
F5 Key engineer leaves mid-migration (bus factor) M / M Primary + secondary per service. AI-generated docs (Phase 0). Weekly walkthroughs. All decisions in ADRs

4. Trade-Off Log

Intentionally not optimizing:

Decision Trade-Off Revisit
Payment stays in monolith ACL maintenance overhead Post-9-months, when other services stable
Incremental React (3–4 modules) Payment UI in iframe, UX gap Month 10+, or hire frontend engineer
Container Apps over AKS Less networking control If services > 15 or team > 10
Single region (active-passive) ~120ms for EU/US users If user growth justifies multi-region
Contract tests over heavy E2E Some gaps found only in staging Phase 3+ expand E2E

Technical debt accepted: Legacy Payment iframe (low) · Hardcoded Comms templates (low) · Manual staging IaC (very low) · Limited load testing pre-Phase 3 (medium — mitigated by canary) · Event schema governance deferred to Phase 2 (medium — mitigated by Pact)

Revisit in 6 months: Payment migration · AI 2x multiplier accuracy · DB decomposition completion · React coverage · Multi-region evaluation · Event sourcing for high-value domains


5. Assumptions

# Assumption Impact If Wrong
A1 Team has senior .NET experience Phase 0 extends 2–4 weeks
A2 Legacy has some docs / discoverable APIs AI analysis takes longer, missed business rules
A3 Azure is the approved cloud provider Architecture rework
A4 "Payment frozen" = code in monolith, API still callable via ACL If API frozen → bookings blocked
A5 AI tools (Cursor Pro, Claude Code) purchasable Multiplier drops 2x→1.2x, capacity ~32 MM
A6 Legacy runs during full 9-month migration If forced shutdown → scope shrinks dramatically
A7 40K users across timezones — no maintenance window Strangler Fig mandatory
A8 Team co-located or same timezone Add async overhead (~10% capacity loss)
A9 No regulatory requirements beyond standard security Add 2–4 weeks compliance per service
A10 Legacy database is SQL Server (CDC compatible) Different CDC tooling needed

Validation: A3–A6 validated Week 1 (stakeholders). A1–A2 validated Week 2 (pair programming + AI scan).


6. AI Usage Declaration

Tool: GitHub Copilot — VS Code Agent Mode (Claude Opus 4.6). Sole AI tool used throughout the entire assessment.

Workflow: Conversational agent collaboration — human provides direction and constraints, AI generates content, human reviews and corrects. All 33 documents, cross-referencing, consistency fixes, gap analysis, and website built through sequential Copilot conversations.

Section AI/Human Human Contribution
Architecture (4.1) 85/15 Directed architecture decisions (Strangler Fig, YARP, per-service DB). Reviewed, corrected event flows
Migration timeline (4.2) 80/20 Set constraints (5 eng, 9 mo, 4 phases). Validated capacity math. Caught cross-doc inconsistency
Failure modeling (4.3) 85/15 Reviewed 8 scenarios, approved final 5, adjusted likelihood ratings
Trade-offs (4.4) 75/25 Engineering judgment on each decision (YARP over Ocelot, Container Apps over AKS)
Assumptions (4.5) 80/20 Identified key gaps from brief (payment frozen, tool procurement)
This declaration (4.6) 90/10 Directed "be honest about how we actually worked"

Overall: ~85% AI (generation, code, analysis, formatting) / ~15% Human (direction, decisions, constraints, review, correction). That 15% is what matters — the Tech Lead decides what to build and when the AI is wrong.

Preventing Blind AI Usage

          ┌──────────┐  Payment: 2 human reviewers
          │ SECURITY │  Zero AI-only merge
          │   GATE   │
          └────┬─────┘
        ┌──────┴──────┐  Business logic: human
        │   HUMAN     │  validates requirement trace
        │   REVIEW    │
        └──────┬──────┘
     ┌─────────┴─────────┐  CodeRabbit auto-review
     │  AI AUTO-REVIEW   │  flags issues for human
     └─────────┬─────────┘
  ┌────────────┴────────────┐  Lint + tests + SAST +
  │   CI PIPELINE GATE      │  contract tests. Must pass.
  │   No --no-verify        │  No exceptions.
  └─────────────────────────┘

Every AI-generated line passes ALL 4 gates. Weekly "explain this code" sessions — team must understand what AI wrote. Prompt library versioned in git.