Case Study — AI-First Legacy Platform Modernization

Role: Technical Lead — PhoenixDX Vietnam Hub Product: Enterprise travel, event & operations platform | ~40,000 global users Scope: Legacy .NET monolith → .NET 8 microservices | 5 engineers | 9 months | Zero downtime Supporting docs: Full analysis repository available at the companion case study website

1. Target Architecture Overview

                    Clients (React 18 + Legacy UI)
                                │
                ┌───────────────┴───────────────┐
                │      API Gateway (YARP)        │
                │  Strangler Fig weighted routing │
                │  Auth │ Rate Limit │ Trace ID   │
                └──┬──────┬──────┬──────┬──────┬─┘
                   │      │      │      │      │
              ┌────┘   ┌──┘   ┌──┘   ┌──┘   ┌──┘
              ▼        ▼      ▼      ▼      ▼
           Travel   Event  Work-  Comms  Reporting
            Svc      Svc   force   Svc   Svc(CQRS)
           .NET8    .NET8   Svc   .NET8   .NET8
              │        │   .NET8    │      │
           [DB]     [DB]    │    [DB]     │
                           [DB]    ┌──────┘
                                   │
  ┌────────────────────────┐  ┌────┴──────────┐
  │ Legacy Monolith        │  │ Reporting DB  │
  │ Payment (frozen Ph.1)  │  │ (CDC replicas)│
  │ ◄── ACL bridge ──────  │  └───────────────┘
  │ Monolith DB ──CDC──────┼──────────┘
  └────────────────────────┘
  ┌────────────────────────────────────────────┐
  │  Azure Service Bus (Event-Driven Backbone) │
  └────────────────────────────────────────────┘
  ┌────────────────────────────────────────────┐
  │  OpenTelemetry + Serilog │ Bicep (IaC)     │
  │  Azure Container Apps │ GitHub Actions CI   │
  └────────────────────────────────────────────┘

Service Boundaries — 6 bounded contexts, 1:1 mapping to DDD domains:

Service	Owns	Communication
Travel Booking	bookings, itineraries, suppliers	Sync REST + Payment ACL + async events
Event Management	events, venues, attendees	Sync REST + Payment ACL + async events
Workforce	staff, allocations, shifts	Subscribes to travel/event events
Communications	notifications, templates	Subscribes to ALL domain events
Reporting (CQRS)	read models, dashboards	CDC from all DBs, event projections
Payment (legacy)	payments, invoices	Stays in monolith — ACL bridge only

Communication rules: (1) Client → sync REST via YARP Gateway. (2) State changes → async event to Service Bus. (3) No cross-service direct DB access — ever.

2. Migration Strategy

4-Phase Timeline

M1       M2       M3       M4       M5       M6       M7       M8       M9
├────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┤
Phase 0 ██  AI Foundation + Infra, Comms pilot (staging)
Phase 1      ████████████████████  Travel(M3) + Event(M4) go-live
Phase 2                            ████████████████████  Workforce(M6)
                                                         Comms+Report(M7)
Phase 3                                                          ████████
                                                           Harden: perf, DR

Phase	Duration	Key Deliverables	Go-Live
0: AI Foundation	M1	AI toolchain, CI/CD, IaC (Bicep), YARP Gateway, Comms pilot	—
1: Core	M2–4	Travel + Event extracted, Payment ACL, per-service DBs, CDC	Travel (M3), Event (M4)
2: Scale	M5–7	Workforce, Comms (prod), Reporting (CQRS), React 18	Workforce (M6), Comms+Report (M7)
3: Harden	M8–9	Load testing (40K users), security audit, DR validation	All 5 hardened

Capacity: 5 eng × 9 mo = 45 raw MM. Phase-by-phase with variable AI multiplier: P0(2.0, ×1.0) + P1(18.0, ×2.0) + P2(19.0, ×2.0) + P3(6.5, ×1.0) ≈ ~44 effective MM.

Zero-Downtime Strategy

Strangler Fig + YARP: Route traffic by URL path — migrate one module at a time, legacy serves unmigrated routes. Per-module cutover: Shadow mode (compare responses) → Canary (5%→25%→50%→100% over 7–11 days) → Full cutover. Auto-rollback if error rate > 0.5%. Rollback = YARP weight change (< 30 seconds).

Backward Compatibility

Both systems run simultaneously. New services call legacy Payment via ACL (adapter pattern). CDC keeps data in sync — no dual-write. Event schemas versioned (v1.0+). React shell loads new modules alongside legacy UI. When Payment is eventually modernized → swap ACL target, zero changes to consuming services.

3. Failure Modeling

#	Scenario	L / I	Mitigation
F1	CDC data inconsistency — new DB diverges from legacy during cutover	M / H	Checksum verification every 6h. Dual-read validation. Auto-pause on mismatch. 7-day parallel soak
F2	Legacy Payment outage blocks new services via ACL	M / H	Circuit breaker (Polly): fail fast after 3 retries. Queue in Service Bus. Graceful degradation: "pending payment"
F3	AI-generated code has business logic errors (e.g., pricing)	H / H	Human review mandatory for business logic. Contract tests (Pact) vs legacy. Shadow+Compare before traffic switch. Payment: zero AI-only merge
F4	Cascading failure during canary — timeout causes retry storm	L / H	Auto-rollback (error > 0.5%). Bulkhead isolation. Rate limiting at Gateway. Kill switch → legacy in < 30s
F5	Key engineer leaves mid-migration (bus factor)	M / M	Primary + secondary per service. AI-generated docs (Phase 0). Weekly walkthroughs. All decisions in ADRs

4. Trade-Off Log

Intentionally not optimizing:

Decision	Trade-Off	Revisit
Payment stays in monolith	ACL maintenance overhead	Post-9-months, when other services stable
Incremental React (3–4 modules)	Payment UI in iframe, UX gap	Month 10+, or hire frontend engineer
Container Apps over AKS	Less networking control	If services > 15 or team > 10
Single region (active-passive)	~120ms for EU/US users	If user growth justifies multi-region
Contract tests over heavy E2E	Some gaps found only in staging	Phase 3+ expand E2E

Technical debt accepted: Legacy Payment iframe (low) · Hardcoded Comms templates (low) · Manual staging IaC (very low) · Limited load testing pre-Phase 3 (medium — mitigated by canary) · Event schema governance deferred to Phase 2 (medium — mitigated by Pact)

Revisit in 6 months: Payment migration · AI 2x multiplier accuracy · DB decomposition completion · React coverage · Multi-region evaluation · Event sourcing for high-value domains

5. Assumptions

#	Assumption	Impact If Wrong
A1	Team has senior .NET experience	Phase 0 extends 2–4 weeks
A2	Legacy has some docs / discoverable APIs	AI analysis takes longer, missed business rules
A3	Azure is the approved cloud provider	Architecture rework
A4	"Payment frozen" = code in monolith, API still callable via ACL	If API frozen → bookings blocked
A5	AI tools (Cursor Pro, Claude Code) purchasable	Multiplier drops 2x→1.2x, capacity ~32 MM
A6	Legacy runs during full 9-month migration	If forced shutdown → scope shrinks dramatically
A7	40K users across timezones — no maintenance window	Strangler Fig mandatory
A8	Team co-located or same timezone	Add async overhead (~10% capacity loss)
A9	No regulatory requirements beyond standard security	Add 2–4 weeks compliance per service
A10	Legacy database is SQL Server (CDC compatible)	Different CDC tooling needed

Validation: A3–A6 validated Week 1 (stakeholders). A1–A2 validated Week 2 (pair programming + AI scan).

6. AI Usage Declaration

Tool: GitHub Copilot — VS Code Agent Mode (Claude Opus 4.6). Sole AI tool used throughout the entire assessment.

Workflow: Conversational agent collaboration — human provides direction and constraints, AI generates content, human reviews and corrects. All 33 documents, cross-referencing, consistency fixes, gap analysis, and website built through sequential Copilot conversations.

Section	AI/Human	Human Contribution
Architecture (4.1)	85/15	Directed architecture decisions (Strangler Fig, YARP, per-service DB). Reviewed, corrected event flows
Migration timeline (4.2)	80/20	Set constraints (5 eng, 9 mo, 4 phases). Validated capacity math. Caught cross-doc inconsistency
Failure modeling (4.3)	85/15	Reviewed 8 scenarios, approved final 5, adjusted likelihood ratings
Trade-offs (4.4)	75/25	Engineering judgment on each decision (YARP over Ocelot, Container Apps over AKS)
Assumptions (4.5)	80/20	Identified key gaps from brief (payment frozen, tool procurement)
This declaration (4.6)	90/10	Directed "be honest about how we actually worked"

Overall: ~85% AI (generation, code, analysis, formatting) / ~15% Human (direction, decisions, constraints, review, correction). That 15% is what matters — the Tech Lead decides what to build and when the AI is wrong.

Preventing Blind AI Usage

          ┌──────────┐  Payment: 2 human reviewers
          │ SECURITY │  Zero AI-only merge
          │   GATE   │
          └────┬─────┘
        ┌──────┴──────┐  Business logic: human
        │   HUMAN     │  validates requirement trace
        │   REVIEW    │
        └──────┬──────┘
     ┌─────────┴─────────┐  CodeRabbit auto-review
     │  AI AUTO-REVIEW   │  flags issues for human
     └─────────┬─────────┘
  ┌────────────┴────────────┐  Lint + tests + SAST +
  │   CI PIPELINE GATE      │  contract tests. Must pass.
  │   No --no-verify        │  No exceptions.
  └─────────────────────────┘

Every AI-generated line passes ALL 4 gates. Weekly "explain this code" sessions — team must understand what AI wrote. Prompt library versioned in git.

Submission (≤6 Pages)