Technical Assessment — Legacy Platform Modernization

Role: Technical Lead — PhoenixDX Vietnam Hub

Product A: Enterprise travel, event & operations platform | ~40,000 global users

Scope: Legacy .NET monolith → .NET 8 microservices | 5 engineers | 9 months | Zero downtime

1. Target Architecture Overview

Service Boundaries — 6 bounded contexts (DDD)

Service	Owns	Communication	Type
Travel Booking	bookings, itineraries, suppliers, pricing rules	Sync REST (client-facing) + Payment ACL (sync to legacy) + Async events (BookingCreated, BookingCancelled)	Core
Event Management	events, venues, schedules, attendees	Sync REST + Payment ACL (sync to legacy) + Async events (EventCreated, AttendeeRegistered)	Core
Workforce + Allocation	staff profiles, allocations, shifts, skills	Subscribes to travel/event events (StaffNeeded, EventStaffed) + Sync REST for staff queries	Supporting
Communications	notifications, templates, delivery logs	Subscribes to ALL domain events (BookingCreated → send confirmation, EventReminder → send email, etc.)	Generic
Reporting (CQRS)	report definitions, read models, dashboards	CDC from all service databases + Event projections from Service Bus	Supporting
Payment (Legacy)	payments, invoices, reconciliation	ACL adapter pattern — new services call a clean interface that translates to legacy API format	Core (Frozen)

Communication rules: (1) Client → sync REST via YARP Gateway. (2) State changes → async event to Service Bus. (3) No cross-service direct DB access — ever.

2. Migration Strategy

4-Phase Timeline

Phase	Duration	Key Deliverables	Go-Live	Effective MM
Phase 0: AI Foundation	M1	AI toolchain deployed (Cursor Pro, Claude Code, CodeRabbit); CI/CD pipeline (GitHub Actions); Infrastructure as Code (Bicep templates)	—	2.0
Phase 1: Core Services	M2–4	Travel Booking service extracted and live (Month 3); Event Management service extracted and live (Month 4); Payment ACL bridge operational	Travel Booking (M3), Event Management (M4)	18.0
Phase 2: Scale Out	M5–7	Workforce + Allocation service live (Month 6); Communications service promoted to production (Month 7); Reporting CQRS service live (Month 7)	Workforce + Allocation (M6), Communications (M7), Reporting (CQRS) (M7)	19.0
Phase 3: Hardening	M8–9	Load testing simulating 40,000 concurrent users; Security audit and penetration testing; Disaster recovery validation (failover + restore)	—	6.5
Total: 5 eng × 9 mo (AI ×{1.0–2.0})				~46 MM

Zero-Downtime Strategy

Strangler Fig + YARP: Route traffic by URL path — migrate one module at a time. Per-module cutover: Shadow (compare) → Canary (5%→25%→50%→100% over 7–11 days) → Full cutover. Auto-rollback if error rate > 0.5%. Rollback = YARP weight change (< 30 seconds).

Backward Compatibility

Both systems run simultaneously. New services call legacy Payment via ACL. CDC keeps data in sync — no dual-write. Event schemas versioned (v1.0+). When Payment modernized → swap ACL target, zero changes to consumers.

3. Failure Modeling

#	Scenario	L / I	Mitigation
F1	CDC sync from legacy DB to new service DB is delayed or misses records. New service serves stale/incorrect data. E.g., Travel shows a booking already cancelled in legacy.	M/H	Automated checksum verification every 6h. Dual-read validation before switching writes. Auto-pause CDC on mismatch. 7-day parallel soak at 100% before decommission.
F2	Legacy monolith crashes → Payment API unavailable. Travel + Event services call ACL → timeout → booking flow blocked entirely.	M/H	Circuit breaker (Polly): fail fast after 3 retries. Queue payment in Service Bus → process when legacy recovers. Graceful degradation: booking as 'pending payment'.
F3	AI agent migrates Travel pricing rules — misses edge case (promo discount stacking). Code passes CI. Users charged wrong prices in production.	H/H	Human review mandatory for ALL business logic. Contract tests (Pact) verify API matches legacy. Shadow+Compare before traffic switch. Payment: zero AI-only merge.
F4	New Event Service at 25% traffic causes timeout. Legacy overloaded with retry storm. Both old and new systems degrade.	L/H	Auto-rollback if error > 0.5%. Bulkhead isolation. Rate limiting at Gateway. Kill switch → 100% legacy in < 30s.
F5	Bus factor = 1 for a service. Engineer leaves mid-migration, taking domain knowledge.	M/M	Primary + secondary engineer per service. AI-generated docs from legacy code (Phase 0). Weekly walkthroughs. All decisions in ADRs.

4. Trade-Off Log

Intentionally not optimizing:

#	Decision	Trade-Off	Revisit
T1	Payment stays in monolith	Migrate Payment early	Post-9-months when all other services stable
T2	Azure Container Apps over AKS	Kubernetes (AKS)	If services > 15 or team > 10 engineers
T3	Azure SQL everywhere	Cosmos DB, Redis, etc.	If specific service needs document store or cache
T4	Incremental React (3-4 modules)	Full React rewrite	Month 10+, or hire frontend engineer
T5	Single region (active-passive)	Multi-region active-active	If user growth justifies multi-region
T6	Contract tests over heavy E2E	Comprehensive E2E suite (Playwright)	Phase 3+ expand E2E coverage
T7	Shared DB views during CDC transition	Full data decomposition from Day 1	Month 7+ when all services own their data

Technical debt accepted: Legacy Payment iframe (low) · Hardcoded Comms templates (low) · Manual staging IaC (very low) · Limited load testing pre-Phase 3 (medium — mitigated by canary) · Event schema governance deferred to Phase 2 (medium — mitigated by Pact)

Revisit in 6 months: Payment migration · AI 2x multiplier accuracy · DB decomposition completion · React coverage · Multi-region evaluation · Event sourcing for high-value domains

5. Assumptions

#	Assumption	Impact If Wrong
A1	Team has senior .NET experience — no major ramp-up needed	Phase 0 extends 2-4 weeks for training
A10	No regulatory requirements beyond standard enterprise security	Add 2-4 weeks compliance work per service
A11	Legacy database is SQL Server (CDC compatible)	Different CDC tooling needed
A12	No mobile app in scope — web-only modernization	Need React Native track + additional frontend engineer
A2	Legacy codebase has some documentation or discoverable APIs	AI analysis takes longer, risk of missed business rules
A3	Azure is the approved cloud provider	Complete architecture rework if AWS/GCP mandated
A4	'Payment frozen' = code stays in monolith, API still callable via ACL	If API frozen too → bookings blocked entirely
A5	AI tools (Cursor Pro, Claude Code) can be purchased — no procurement blocker	Multiplier drops from 2x → 1.2x, capacity ~32 MM
A6	Legacy monolith continues running during full 9-month migration	If forced shutdown → scope shrinks dramatically
A7	40K users across timezones — no safe maintenance window	If single timezone → could simplify cutover
A8	Team co-located or same timezone (Vietnam)	Add async overhead (~10% capacity loss)
A9	Azure Service Bus acceptable for messaging	Minor: swap messaging broker, patterns stay same

Validation: A3–A6 validated Week 1 (stakeholders). A1–A2 validated Week 2 (pair programming + AI scan).

6. AI Usage Declaration

Transparency principle: This entire assessment was produced through a conversational collaboration with a single AI tool. Rather than hide this, we demonstrate exactly how — because a Tech Lead who governs AI usage must first be honest about their own.

Tool

GitHub Copilot — VS Code Agent Mode (Claude Opus 4.6) — used as the sole AI tool throughout. No other AI tools were used. All work happened inside a single VS Code workspace via conversational agent interactions.

How We Actually Worked — Step by Step

Step	What Happened	Human Role	AI Role
1	Requirements extraction	Provided assessment brief, directed scope	Parsed requirements, created Requirement.md
2	Strategy & analysis	Guided focus areas, set constraints	Generated Strategy.md, Analysis.md
3	Deliverable documents (4.1–4.6)	Directed each doc, reviewed output, requested corrections	Drafted all 6 deliverables with architecture, timelines, failure models
4	Supporting analysis docs	Identified what was missing, prioritized	Created 15+ supporting docs (tech stack, planning, security, testing, etc.)
5	Cross-document consistency	Spotted capacity math mismatch, directed sync	Audited all 33 files, fixed inconsistencies across 7 documents
6	Gap analysis	Asked 'what's missing?'	Identified 5 gaps, created Cost Analysis, Security, Testing, Observability, API Design docs
7	Language conversion (VI→EN)	Decided to standardize in English	Converted all 33 files from Vietnamese to professional English
8	Website creation	Directed 'build a website for this'	Built full Next.js + SQLite site (flat doc viewer)
9	Website restructure	Said 'make it entity-based, add Mermaid'	Rebuilt into structured pages: Dashboard, Architecture, Phases, Services, Risks, Tech, Team, Docs

Per-Section AI/Human Split — Honest Numbers

Section	AI/Human	What The Human Actually Did
Architecture (4.1)	85/15	Directed: 'use Strangler Fig, YARP, per-service DB'. Reviewed output, corrected event flow. AI generated diagrams, wrote all service boundary details
Migration timeline (4.2)	80/20	Set constraints: '5 eng, 9 months, 4 phases'. Validated capacity math. AI generated phase structure, deliverables, and cutover procedures
Failure modeling (4.3)	85/15	Reviewed 8 scenarios, approved final 5. Adjusted likelihood ratings. AI generated all scenarios and mitigations
Trade-offs (4.4)	75/25	Engineering judgment on each decision (e.g., YARP over Ocelot, Container Apps over AKS). AI structured and wrote justifications
Assumptions (4.5)	80/20	Identified key gaps from brief (payment frozen, tool procurement). AI organized into 12 assumptions with impact analysis
This declaration (4.6)	90/10	Said 'this section is inaccurate, fix it honestly'. AI rewrote with actual workflow
Website + presentation	95/5	Directed structure: 'make entity-based, use Mermaid'. AI wrote all code

Overall honest split: ~85% AI (content generation, code, formatting, analysis, cross-referencing) / ~15% Human (direction, decisions, constraint setting, review, correction).
But that 15% is what matters: which architecture pattern, which trade-offs to accept, what constraints are non-negotiable, when to say "this is wrong, fix it". AI generates — the Tech Lead decides.

Preventing Blind AI Usage — 4-Gate Governance

Every AI-generated line passes ALL 4 gates. Weekly "explain this code" sessions — team must understand what AI wrote. Prompt library versioned in git.

Key insight — why this matters for a Tech Lead: This assessment proves AI can generate 85% of enterprise-grade technical documentation when properly directed. The Tech Lead's value is not in typing — it's in knowing what to build, which constraints matter, and when the AI is wrong. That's exactly what we'd govern across a 5-engineer team: maximize AI output, ensure human judgment at every decision point.

← Dashboard Raw Document →