Dashboard/Submission

Technical Assessment — Legacy Platform Modernization

Role: Technical Lead — PhoenixDX Vietnam Hub

Product A: Enterprise travel, event & operations platform | ~40,000 global users

Scope: Legacy .NET monolith → .NET 8 microservices | 5 engineers | 9 months | Zero downtime


1. Target Architecture Overview

Service Boundaries — 6 bounded contexts (DDD)

ServiceOwnsCommunicationType
Travel Bookingbookings, itineraries, suppliers, pricing rulesSync REST (client-facing) + Payment ACL (sync to legacy) + Async events (BookingCreated, BookingCancelled)Core
Event Managementevents, venues, schedules, attendeesSync REST + Payment ACL (sync to legacy) + Async events (EventCreated, AttendeeRegistered)Core
Workforce + Allocationstaff profiles, allocations, shifts, skillsSubscribes to travel/event events (StaffNeeded, EventStaffed) + Sync REST for staff queriesSupporting
Communicationsnotifications, templates, delivery logsSubscribes to ALL domain events (BookingCreated → send confirmation, EventReminder → send email, etc.)Generic
Reporting (CQRS)report definitions, read models, dashboardsCDC from all service databases + Event projections from Service BusSupporting
Payment (Legacy)payments, invoices, reconciliationACL adapter pattern — new services call a clean interface that translates to legacy API formatCore (Frozen)
Communication rules: (1) Client → sync REST via YARP Gateway. (2) State changes → async event to Service Bus. (3) No cross-service direct DB access — ever.

2. Migration Strategy

4-Phase Timeline

PhaseDurationKey DeliverablesGo-LiveEffective MM
Phase 0: AI FoundationM1AI toolchain deployed (Cursor Pro, Claude Code, CodeRabbit); CI/CD pipeline (GitHub Actions); Infrastructure as Code (Bicep templates)2.0
Phase 1: Core ServicesM2–4Travel Booking service extracted and live (Month 3); Event Management service extracted and live (Month 4); Payment ACL bridge operationalTravel Booking (M3), Event Management (M4)18.0
Phase 2: Scale OutM5–7Workforce + Allocation service live (Month 6); Communications service promoted to production (Month 7); Reporting CQRS service live (Month 7)Workforce + Allocation (M6), Communications (M7), Reporting (CQRS) (M7)19.0
Phase 3: HardeningM8–9Load testing simulating 40,000 concurrent users; Security audit and penetration testing; Disaster recovery validation (failover + restore)6.5
Total: 5 eng × 9 mo (AI ×{1.0–2.0})~46 MM

Zero-Downtime Strategy

Strangler Fig + YARP: Route traffic by URL path — migrate one module at a time. Per-module cutover: Shadow (compare) → Canary (5%→25%→50%→100% over 7–11 days) → Full cutover. Auto-rollback if error rate > 0.5%. Rollback = YARP weight change (< 30 seconds).

Backward Compatibility

Both systems run simultaneously. New services call legacy Payment via ACL. CDC keeps data in sync — no dual-write. Event schemas versioned (v1.0+). When Payment modernized → swap ACL target, zero changes to consumers.

3. Failure Modeling

#ScenarioL / IMitigation
F1CDC sync from legacy DB to new service DB is delayed or misses records. New service serves stale/incorrect data. E.g., Travel shows a booking already cancelled in legacy.M/HAutomated checksum verification every 6h. Dual-read validation before switching writes. Auto-pause CDC on mismatch. 7-day parallel soak at 100% before decommission.
F2Legacy monolith crashes → Payment API unavailable. Travel + Event services call ACL → timeout → booking flow blocked entirely.M/HCircuit breaker (Polly): fail fast after 3 retries. Queue payment in Service Bus → process when legacy recovers. Graceful degradation: booking as 'pending payment'.
F3AI agent migrates Travel pricing rules — misses edge case (promo discount stacking). Code passes CI. Users charged wrong prices in production.H/HHuman review mandatory for ALL business logic. Contract tests (Pact) verify API matches legacy. Shadow+Compare before traffic switch. Payment: zero AI-only merge.
F4New Event Service at 25% traffic causes timeout. Legacy overloaded with retry storm. Both old and new systems degrade.L/HAuto-rollback if error > 0.5%. Bulkhead isolation. Rate limiting at Gateway. Kill switch → 100% legacy in < 30s.
F5Bus factor = 1 for a service. Engineer leaves mid-migration, taking domain knowledge.M/MPrimary + secondary engineer per service. AI-generated docs from legacy code (Phase 0). Weekly walkthroughs. All decisions in ADRs.

4. Trade-Off Log

Intentionally not optimizing:

#DecisionTrade-OffRevisit
T1Payment stays in monolithMigrate Payment earlyPost-9-months when all other services stable
T2Azure Container Apps over AKSKubernetes (AKS)If services > 15 or team > 10 engineers
T3Azure SQL everywhereCosmos DB, Redis, etc.If specific service needs document store or cache
T4Incremental React (3-4 modules)Full React rewriteMonth 10+, or hire frontend engineer
T5Single region (active-passive)Multi-region active-activeIf user growth justifies multi-region
T6Contract tests over heavy E2EComprehensive E2E suite (Playwright)Phase 3+ expand E2E coverage
T7Shared DB views during CDC transitionFull data decomposition from Day 1Month 7+ when all services own their data
Technical debt accepted: Legacy Payment iframe (low) · Hardcoded Comms templates (low) · Manual staging IaC (very low) · Limited load testing pre-Phase 3 (medium — mitigated by canary) · Event schema governance deferred to Phase 2 (medium — mitigated by Pact)
Revisit in 6 months: Payment migration · AI 2x multiplier accuracy · DB decomposition completion · React coverage · Multi-region evaluation · Event sourcing for high-value domains

5. Assumptions

#AssumptionImpact If Wrong
A1Team has senior .NET experience — no major ramp-up neededPhase 0 extends 2-4 weeks for training
A10No regulatory requirements beyond standard enterprise securityAdd 2-4 weeks compliance work per service
A11Legacy database is SQL Server (CDC compatible)Different CDC tooling needed
A12No mobile app in scope — web-only modernizationNeed React Native track + additional frontend engineer
A2Legacy codebase has some documentation or discoverable APIsAI analysis takes longer, risk of missed business rules
A3Azure is the approved cloud providerComplete architecture rework if AWS/GCP mandated
A4'Payment frozen' = code stays in monolith, API still callable via ACLIf API frozen too → bookings blocked entirely
A5AI tools (Cursor Pro, Claude Code) can be purchased — no procurement blockerMultiplier drops from 2x → 1.2x, capacity ~32 MM
A6Legacy monolith continues running during full 9-month migrationIf forced shutdown → scope shrinks dramatically
A740K users across timezones — no safe maintenance windowIf single timezone → could simplify cutover
A8Team co-located or same timezone (Vietnam)Add async overhead (~10% capacity loss)
A9Azure Service Bus acceptable for messagingMinor: swap messaging broker, patterns stay same
Validation: A3–A6 validated Week 1 (stakeholders). A1–A2 validated Week 2 (pair programming + AI scan).

6. AI Usage Declaration

Transparency principle: This entire assessment was produced through a conversational collaboration with a single AI tool. Rather than hide this, we demonstrate exactly how — because a Tech Lead who governs AI usage must first be honest about their own.

Tool

GitHub Copilot — VS Code Agent Mode (Claude Opus 4.6) — used as the sole AI tool throughout. No other AI tools were used. All work happened inside a single VS Code workspace via conversational agent interactions.

How We Actually Worked — Step by Step

StepWhat HappenedHuman RoleAI Role
1Requirements extractionProvided assessment brief, directed scopeParsed requirements, created Requirement.md
2Strategy & analysisGuided focus areas, set constraintsGenerated Strategy.md, Analysis.md
3Deliverable documents (4.1–4.6)Directed each doc, reviewed output, requested correctionsDrafted all 6 deliverables with architecture, timelines, failure models
4Supporting analysis docsIdentified what was missing, prioritizedCreated 15+ supporting docs (tech stack, planning, security, testing, etc.)
5Cross-document consistencySpotted capacity math mismatch, directed syncAudited all 33 files, fixed inconsistencies across 7 documents
6Gap analysisAsked 'what's missing?'Identified 5 gaps, created Cost Analysis, Security, Testing, Observability, API Design docs
7Language conversion (VI→EN)Decided to standardize in EnglishConverted all 33 files from Vietnamese to professional English
8Website creationDirected 'build a website for this'Built full Next.js + SQLite site (flat doc viewer)
9Website restructureSaid 'make it entity-based, add Mermaid'Rebuilt into structured pages: Dashboard, Architecture, Phases, Services, Risks, Tech, Team, Docs

Per-Section AI/Human Split — Honest Numbers

SectionAI/HumanWhat The Human Actually Did
Architecture (4.1)85/15Directed: 'use Strangler Fig, YARP, per-service DB'. Reviewed output, corrected event flow. AI generated diagrams, wrote all service boundary details
Migration timeline (4.2)80/20Set constraints: '5 eng, 9 months, 4 phases'. Validated capacity math. AI generated phase structure, deliverables, and cutover procedures
Failure modeling (4.3)85/15Reviewed 8 scenarios, approved final 5. Adjusted likelihood ratings. AI generated all scenarios and mitigations
Trade-offs (4.4)75/25Engineering judgment on each decision (e.g., YARP over Ocelot, Container Apps over AKS). AI structured and wrote justifications
Assumptions (4.5)80/20Identified key gaps from brief (payment frozen, tool procurement). AI organized into 12 assumptions with impact analysis
This declaration (4.6)90/10Said 'this section is inaccurate, fix it honestly'. AI rewrote with actual workflow
Website + presentation95/5Directed structure: 'make entity-based, use Mermaid'. AI wrote all code
Overall honest split: ~85% AI (content generation, code, formatting, analysis, cross-referencing) / ~15% Human (direction, decisions, constraint setting, review, correction).
But that 15% is what matters: which architecture pattern, which trade-offs to accept, what constraints are non-negotiable, when to say "this is wrong, fix it". AI generates — the Tech Lead decides.

Preventing Blind AI Usage — 4-Gate Governance

Every AI-generated line passes ALL 4 gates. Weekly "explain this code" sessions — team must understand what AI wrote. Prompt library versioned in git.

Key insight — why this matters for a Tech Lead: This assessment proves AI can generate 85% of enterprise-grade technical documentation when properly directed. The Tech Lead's value is not in typing — it's in knowing what to build, which constraints matter, and when the AI is wrong. That's exactly what we'd govern across a 5-engineer team: maximize AI output, ensure human judgment at every decision point.