Technical Assessment — Legacy Platform Modernization
Role: Technical Lead — PhoenixDX Vietnam Hub
Product A: Enterprise travel, event & operations platform | ~40,000 global users
Scope: Legacy .NET monolith → .NET 8 microservices | 5 engineers | 9 months | Zero downtime
1. Target Architecture Overview
Service Boundaries — 6 bounded contexts (DDD)
| Service | Owns | Communication | Type |
|---|---|---|---|
| Travel Booking | bookings, itineraries, suppliers, pricing rules | Sync REST (client-facing) + Payment ACL (sync to legacy) + Async events (BookingCreated, BookingCancelled) | Core |
| Event Management | events, venues, schedules, attendees | Sync REST + Payment ACL (sync to legacy) + Async events (EventCreated, AttendeeRegistered) | Core |
| Workforce + Allocation | staff profiles, allocations, shifts, skills | Subscribes to travel/event events (StaffNeeded, EventStaffed) + Sync REST for staff queries | Supporting |
| Communications | notifications, templates, delivery logs | Subscribes to ALL domain events (BookingCreated → send confirmation, EventReminder → send email, etc.) | Generic |
| Reporting (CQRS) | report definitions, read models, dashboards | CDC from all service databases + Event projections from Service Bus | Supporting |
| Payment (Legacy) | payments, invoices, reconciliation | ACL adapter pattern — new services call a clean interface that translates to legacy API format | Core (Frozen) |
2. Migration Strategy
4-Phase Timeline
| Phase | Duration | Key Deliverables | Go-Live | Effective MM |
|---|---|---|---|---|
| Phase 0: AI Foundation | M1 | AI toolchain deployed (Cursor Pro, Claude Code, CodeRabbit); CI/CD pipeline (GitHub Actions); Infrastructure as Code (Bicep templates) | — | 2.0 |
| Phase 1: Core Services | M2–4 | Travel Booking service extracted and live (Month 3); Event Management service extracted and live (Month 4); Payment ACL bridge operational | Travel Booking (M3), Event Management (M4) | 18.0 |
| Phase 2: Scale Out | M5–7 | Workforce + Allocation service live (Month 6); Communications service promoted to production (Month 7); Reporting CQRS service live (Month 7) | Workforce + Allocation (M6), Communications (M7), Reporting (CQRS) (M7) | 19.0 |
| Phase 3: Hardening | M8–9 | Load testing simulating 40,000 concurrent users; Security audit and penetration testing; Disaster recovery validation (failover + restore) | — | 6.5 |
| Total: 5 eng × 9 mo (AI ×{1.0–2.0}) | ~46 MM | |||
Zero-Downtime Strategy
Strangler Fig + YARP: Route traffic by URL path — migrate one module at a time. Per-module cutover: Shadow (compare) → Canary (5%→25%→50%→100% over 7–11 days) → Full cutover. Auto-rollback if error rate > 0.5%. Rollback = YARP weight change (< 30 seconds).
Backward Compatibility
Both systems run simultaneously. New services call legacy Payment via ACL. CDC keeps data in sync — no dual-write. Event schemas versioned (v1.0+). When Payment modernized → swap ACL target, zero changes to consumers.
3. Failure Modeling
| # | Scenario | L / I | Mitigation |
|---|---|---|---|
| F1 | CDC sync from legacy DB to new service DB is delayed or misses records. New service serves stale/incorrect data. E.g., Travel shows a booking already cancelled in legacy. | M/H | Automated checksum verification every 6h. Dual-read validation before switching writes. Auto-pause CDC on mismatch. 7-day parallel soak at 100% before decommission. |
| F2 | Legacy monolith crashes → Payment API unavailable. Travel + Event services call ACL → timeout → booking flow blocked entirely. | M/H | Circuit breaker (Polly): fail fast after 3 retries. Queue payment in Service Bus → process when legacy recovers. Graceful degradation: booking as 'pending payment'. |
| F3 | AI agent migrates Travel pricing rules — misses edge case (promo discount stacking). Code passes CI. Users charged wrong prices in production. | H/H | Human review mandatory for ALL business logic. Contract tests (Pact) verify API matches legacy. Shadow+Compare before traffic switch. Payment: zero AI-only merge. |
| F4 | New Event Service at 25% traffic causes timeout. Legacy overloaded with retry storm. Both old and new systems degrade. | L/H | Auto-rollback if error > 0.5%. Bulkhead isolation. Rate limiting at Gateway. Kill switch → 100% legacy in < 30s. |
| F5 | Bus factor = 1 for a service. Engineer leaves mid-migration, taking domain knowledge. | M/M | Primary + secondary engineer per service. AI-generated docs from legacy code (Phase 0). Weekly walkthroughs. All decisions in ADRs. |
4. Trade-Off Log
Intentionally not optimizing:
| # | Decision | Trade-Off | Revisit |
|---|---|---|---|
| T1 | Payment stays in monolith | Migrate Payment early | Post-9-months when all other services stable |
| T2 | Azure Container Apps over AKS | Kubernetes (AKS) | If services > 15 or team > 10 engineers |
| T3 | Azure SQL everywhere | Cosmos DB, Redis, etc. | If specific service needs document store or cache |
| T4 | Incremental React (3-4 modules) | Full React rewrite | Month 10+, or hire frontend engineer |
| T5 | Single region (active-passive) | Multi-region active-active | If user growth justifies multi-region |
| T6 | Contract tests over heavy E2E | Comprehensive E2E suite (Playwright) | Phase 3+ expand E2E coverage |
| T7 | Shared DB views during CDC transition | Full data decomposition from Day 1 | Month 7+ when all services own their data |
5. Assumptions
| # | Assumption | Impact If Wrong |
|---|---|---|
| A1 | Team has senior .NET experience — no major ramp-up needed | Phase 0 extends 2-4 weeks for training |
| A10 | No regulatory requirements beyond standard enterprise security | Add 2-4 weeks compliance work per service |
| A11 | Legacy database is SQL Server (CDC compatible) | Different CDC tooling needed |
| A12 | No mobile app in scope — web-only modernization | Need React Native track + additional frontend engineer |
| A2 | Legacy codebase has some documentation or discoverable APIs | AI analysis takes longer, risk of missed business rules |
| A3 | Azure is the approved cloud provider | Complete architecture rework if AWS/GCP mandated |
| A4 | 'Payment frozen' = code stays in monolith, API still callable via ACL | If API frozen too → bookings blocked entirely |
| A5 | AI tools (Cursor Pro, Claude Code) can be purchased — no procurement blocker | Multiplier drops from 2x → 1.2x, capacity ~32 MM |
| A6 | Legacy monolith continues running during full 9-month migration | If forced shutdown → scope shrinks dramatically |
| A7 | 40K users across timezones — no safe maintenance window | If single timezone → could simplify cutover |
| A8 | Team co-located or same timezone (Vietnam) | Add async overhead (~10% capacity loss) |
| A9 | Azure Service Bus acceptable for messaging | Minor: swap messaging broker, patterns stay same |
6. AI Usage Declaration
Tool
GitHub Copilot — VS Code Agent Mode (Claude Opus 4.6) — used as the sole AI tool throughout. No other AI tools were used. All work happened inside a single VS Code workspace via conversational agent interactions.
How We Actually Worked — Step by Step
| Step | What Happened | Human Role | AI Role |
|---|---|---|---|
| 1 | Requirements extraction | Provided assessment brief, directed scope | Parsed requirements, created Requirement.md |
| 2 | Strategy & analysis | Guided focus areas, set constraints | Generated Strategy.md, Analysis.md |
| 3 | Deliverable documents (4.1–4.6) | Directed each doc, reviewed output, requested corrections | Drafted all 6 deliverables with architecture, timelines, failure models |
| 4 | Supporting analysis docs | Identified what was missing, prioritized | Created 15+ supporting docs (tech stack, planning, security, testing, etc.) |
| 5 | Cross-document consistency | Spotted capacity math mismatch, directed sync | Audited all 33 files, fixed inconsistencies across 7 documents |
| 6 | Gap analysis | Asked 'what's missing?' | Identified 5 gaps, created Cost Analysis, Security, Testing, Observability, API Design docs |
| 7 | Language conversion (VI→EN) | Decided to standardize in English | Converted all 33 files from Vietnamese to professional English |
| 8 | Website creation | Directed 'build a website for this' | Built full Next.js + SQLite site (flat doc viewer) |
| 9 | Website restructure | Said 'make it entity-based, add Mermaid' | Rebuilt into structured pages: Dashboard, Architecture, Phases, Services, Risks, Tech, Team, Docs |
Per-Section AI/Human Split — Honest Numbers
| Section | AI/Human | What The Human Actually Did |
|---|---|---|
| Architecture (4.1) | 85/15 | Directed: 'use Strangler Fig, YARP, per-service DB'. Reviewed output, corrected event flow. AI generated diagrams, wrote all service boundary details |
| Migration timeline (4.2) | 80/20 | Set constraints: '5 eng, 9 months, 4 phases'. Validated capacity math. AI generated phase structure, deliverables, and cutover procedures |
| Failure modeling (4.3) | 85/15 | Reviewed 8 scenarios, approved final 5. Adjusted likelihood ratings. AI generated all scenarios and mitigations |
| Trade-offs (4.4) | 75/25 | Engineering judgment on each decision (e.g., YARP over Ocelot, Container Apps over AKS). AI structured and wrote justifications |
| Assumptions (4.5) | 80/20 | Identified key gaps from brief (payment frozen, tool procurement). AI organized into 12 assumptions with impact analysis |
| This declaration (4.6) | 90/10 | Said 'this section is inaccurate, fix it honestly'. AI rewrote with actual workflow |
| Website + presentation | 95/5 | Directed structure: 'make entity-based, use Mermaid'. AI wrote all code |
But that 15% is what matters: which architecture pattern, which trade-offs to accept, what constraints are non-negotiable, when to say "this is wrong, fix it". AI generates — the Tech Lead decides.
Preventing Blind AI Usage — 4-Gate Governance
Every AI-generated line passes ALL 4 gates. Weekly "explain this code" sessions — team must understand what AI wrote. Prompt library versioned in git.