Constraints Analysis

Deep-dive into the 5 project constraints — interactions between them, and capacity math

1. Original Problem Statement

Constraint	Detail
Users	40,000 active global users
Availability	Zero downtime during migration
Payment	Payment flow cannot change in Phase 1
Team Size	5 engineers only
Timeline	9 months total

2. Constraint-by-Constraint Analysis — What They Really Mean

2.1 "40,000 active global users"

Surface: User count.
Real signal:

Aspect	Analysis
"40,000"	Not a startup (100 users) but not Facebook either (1B). Medium-scale enterprise. What matters more than the absolute number is the behavior pattern
"active"	These are active users, not registered. 40K active = potentially 200K+ registered. Daily active likely 5K-15K
"global"	Multiple timezones → no maintenance window. "There is always someone using the system." Traffic follows the sun

Traffic estimation:

40K active users
├── Peak concurrent: ~10% = 4,000 concurrent sessions
├── Requests per session: ~20 pages/actions per session
├── Peak RPS: 4,000 × 20 / 3600 ≈ 22 req/s sustained
│   └── Burst: 5-10x → 100-220 req/s peak
├── Daily API calls: ~2-5 million
└── Database transactions: ~500K-1M/day

Scale verdict: MODERATE
  • No need for Kafka-level streaming
  • Azure Service Bus Standard tier is sufficient
  • Per-service Azure SQL handles this comfortably  
  • Azure Container Apps auto-scale handles burst
  • CDN + cache reduces 60-80% of raw traffic

"Global" implications for architecture:

Global users = multi-region consideration

Option A: Single region + CDN (recommended Phase 1-2)
  Azure Southeast Asia (Singapore) — nearest to Vietnam
  Azure Front Door → cache static + route to nearest edge
  Latency: ~50-200ms globally (acceptable for enterprise app)

Option B: Multi-region active-active (Phase 3+, if needed)
  Primary: Southeast Asia
  Secondary: West Europe or East US
  Data replication: Azure SQL geo-replication
  
Trade-off: Option A is sufficient for 40K users. Multi-region is only 
  justified if SLA requires <100ms globally or regulatory compliance.
  With 5 engineers + 9 months → Option A is the correct choice.

2.2 "Zero downtime during migration"

This is the HARDEST constraint.

Zero downtime does NOT mean "deploy fast then restart"

Zero downtime MEANS:
  ┌─────────────────────────────────────────────────┐
  │ 1. At every point during 9 months, 40K users     │
  │    MUST be able to access the system normally      │
  │                                                    │
  │ 2. No "scheduled maintenance window"               │
  │    (global users = no safe window)                 │
  │                                                    │
  │ 3. Migration happens "invisible" to users          │
  │    Today: request → monolith                       │
  │    Tomorrow: request → new service (user unaware)  │
  │                                                    │
  │ 4. Rollback MUST be instant                        │
  │    If new service fails → route back to monolith   │
  │    in seconds, not minutes                         │
  └─────────────────────────────────────────────────┘

Patterns required by this constraint:

Pattern	Purpose	Implementation
Strangler Fig	Shift traffic per module, no big bang	YARP proxy routing: path-based → old or new
Feature Flags	Toggle new vs old per module, per user	LaunchDarkly or Azure App Configuration
Blue-Green Deploy	2 versions running in parallel, switch routing	Container Apps revisions, traffic splitting
Canary Release	Route 5% traffic → new service, monitor, scale	YARP weighted routing rules
Shadow Mode	New service receives copy of traffic, compare output	Dual-write: monolith processes, new service validates
Circuit Breaker	Auto-fallback if new service unhealthy	Polly library (.NET)
Database CDC	Sync data between old DB and new DBs	Debezium or Azure SQL CDC → Service Bus

Interaction with other constraints:

Zero downtime + 5 engineers = MUST simplify

If 50 engineers: parallel migration, complex infra is OK
If 5 engineers: one module at a time, simple patterns, automated rollback

→ Strangler Fig + YARP + Feature Flags is the minimum viable set
→ Blue-Green with Container Apps (built-in), not custom
→ Shadow Mode only for critical paths (Travel booking)

2.3 "Payment flow cannot change in Phase 1"

Why freeze Payment?

Payment = HIGHEST RISK module

Risks if you touch Payment:
  1. PCI DSS compliance → audit, certification
  2. Financial transactions → money loss if bugs slip through
  3. Regulatory → legal liability
  4. User trust → payment failure = user churn
  5. Complexity → payment gateway integrations, reconciliation

  Key signal: "Do you understand that NOT doing something is also an engineering decision?"

"Phase 1" definition:

Scope	Timeline	Meaning
Phase 0	Month 1	AI setup, infra foundation
Phase 1	Month 2-4	Payment FROZEN here
Phase 2	Month 5-7	Payment stays in monolith, but planning can begin
Phase 3	Month 8-9	Payment migration CAN start if team is confident

ACL Pattern cho Payment:

┌──────────────────────────────────────────────────────────┐
│                                                           │
│  New Travel Service ──→ Anti-Corruption Layer ──→ Legacy  │
│  (needs payment)         (adapter/facade)       Monolith  │
│                                                 Payment   │
│                                                           │
│  ACL does:                                                │
│  1. Translate new service's PaymentRequest                │
│     → legacy Payment API format                           │
│  2. Handle legacy exceptions → standard errors            │
│  3. Log/trace calls for observability                     │
│  4. Rate limit to protect legacy system                   │
│  5. Circuit break if legacy is slow                       │
│                                                           │
│  ACL does NOT:                                            │
│  • Change payment logic                                   │
│  • Store payment data in new DB                           │
│  • Process payments differently                           │
│                                                           │
│  This is a BRIDGE, not a migration.                       │
└──────────────────────────────────────────────────────────┘

Hidden implication: Every other service (Travel, Event) that needs payment must go through the ACL. This is one additional component to build and maintain. Effort for ACL ≈ 1-2 weeks.

2.4 "5 engineers only"

This is the MOST LIMITING constraint.

5 engineers × 9 months = 45 man-months RAW

Overhead deduction:
  - Meetings, planning, reviews: ~15%
  - Learning new tech/patterns: ~10% (higher in Phase 0-1)
  - Sick leave, vacation: ~5%
  - Context switching: ~5%
  
Effective: 45 × 0.65 ≈ 29 man-months TRADITIONAL

With AI multiplier 2x (AI-heavy agentic):
  Effective: 29 × 1.5 (conservative) ≈ 44 man-months
  
  Explanation:
  • 2x does NOT mean 2 × 45 = 90
  • 2x applies to coding tasks (~60% of time)
  • Non-coding tasks (meetings, design) do not get the 2x multiplier
  • Realistic: ~44-50 effective man-months

Team allocation (recommended):

Engineer	Role	Focus
D1 (Lead/Architect)	Tech Lead	Architecture, AI pipeline setup, code review, cross-cutting
D2 (Senior)	Backend Lead	Travel Service (hardest module), mentoring D4-D5
D3 (Senior)	Platform	Infra (Bicep), CI/CD, API Gateway, observability, shared libraries
D4 (Mid)	Full-stack	Event Service, then Workforce, frontend
D5 (Mid)	Full-stack	Communications (pilot), then Reporting, frontend

Brooks's Law warning:

"Adding people to a late project makes it later" — Fred Brooks

Implication: 5 engineers is FIXED. Cannot add people at Month 6.
Every decision must pass the "Feasible for 5?" test:

  ✅ Azure Container Apps (not AKS) — less ops
  ✅ Bicep over Terraform — simpler, Azure-only
  ✅ Single SPA not micro-frontends — less infra
  ✅ Azure SQL everywhere — one DB technology
  ✅ MassTransit over raw SDK — less boilerplate
  
  ❌ Kubernetes — too much ops
  ❌ Kafka — too much ops  
  ❌ Micro-frontends — too much infra
  ❌ Polyglot databases — too much expertise spread
  ❌ Custom service mesh — unnecessary at this scale

2.5 "9 months total"

Timeline analysis:

9 months = 39 working weeks ≈ 195 working days

Phase breakdown:
  Phase 0 (M1):     4 weeks  — AI setup, infra, pilot
  Phase 1 (M2-4):  13 weeks  — Travel + Event extraction
  Phase 2 (M5-7):  13 weeks  — Workforce + Comms + Reporting  
  Phase 3 (M8-9):   9 weeks  — Stabilize, optimize, handover

Key milestones:
  M1 end:  AI pipeline working, infra ready, Comms pilot extracted
  M3 end:  Travel Service live (canary 10%)
  M4 end:  Event Service live, Travel 100%
  M6 end:  Workforce live
  M7 end:  Comms + Reporting live
  M9 end:  Monolith reduced to Payment + legacy shell

What does NOT fit within 9 months?

Item	Status	Reason
Full Kubernetes migration	❌ Defer	Ops overhead, Container Apps is sufficient
Payment extraction	❌ Defer	Frozen Phase 1, only plan in Phase 3. Execute post-9-months
Multi-region active-active	❌ Defer	Single region + CDN is sufficient for 40K
ML models in production	❌ Defer	AI-ready data foundation: YES. Production ML: NO
Mobile app	❌ Out of scope	Not in project scope
Full legacy decommission	❌ Defer	Monolith will keep running for Payment. Kill post-payment migration

3. Constraint Interaction Matrix — How They Affect Each Other

        40K Users  Zero DT   Payment    5 Eng     9 Months
        ─────────  ────────  ─────────  ────────  ────────
40K     ·          AMPLIFY   neutral    PRESSURE  neutral
Zero DT AMPLIFY    ·         SIMPLIFY   PRESSURE  PRESSURE
Payment neutral    SIMPLIFY  ·          RELIEF    RELIEF
5 Eng   PRESSURE   PRESSURE  RELIEF     ·         AMPLIFY
9 Mon   neutral    PRESSURE  RELIEF     AMPLIFY   ·

Key interaction explanations:

Interaction	Meaning
40K × Zero DT = AMPLIFY	40K global users → NO maintenance window. Zero downtime must be absolute, not "around 2am should be fine"
Zero DT × 5 Eng = PRESSURE	Strangler Fig + canary + rollback requires ops investment. 5 engineers must automate everything, no manual rollback
Payment frozen × 5 Eng = RELIEF	1 fewer module → 5 engineers can focus on remaining 5 modules. Frozen = scope reduction
Payment frozen × 9 Months = RELIEF	Reduced scope → more breathing room on timeline. This is a trade-off the constraints give you — leverage it!
5 Eng × 9 Months = AMPLIFY	Not enough people + not enough time = must sacrifice scope or quality. Choose to sacrifice scope (defer features)

4. Capacity Math — Man-Month Breakdown

Total raw: 5 engineers × 9 months = 45 man-months

Overhead deduction (~40%):
  Sprint ceremonies + reviews:    -7  MM
  Learning curve (Phase 0-1):     -4  MM
  Context switching + meetings:   -5  MM
  Leave + buffer:                 -2  MM
  ─────────────────────────────────────
  Net available:                  27  MM traditional

Phase-by-phase (variable AI multiplier):
  P0 (M1):    5.0 raw - 3.0 overhead = 2.0 × 1.0 =  2.0 MM
  P1 (M2-4): 15.0 raw - 6.0 overhead = 9.0 × 2.0 = 18.0 MM
  P2 (M5-7): 15.0 raw - 5.5 overhead = 9.5 × 2.0 = 19.0 MM
  P3 (M8-9): 10.0 raw - 3.5 overhead = 6.5 × 1.0 =  6.5 MM
  ─────────────────────────────────────────────────────────
  Total effective: 45.5 ≈ ~44 man-months (conservative)
  Equivalent to: ~7.5 traditional engineers for 9 months

(Methodology aligned with Analysis v2.md & Planning.md)

Allocation per phase (variable multiplier):
  Phase 0 (M1):    5 MM raw →  2 MM effective (AI not yet set up, ×1.0)
  Phase 1 (M2-4): 15 MM raw → 18 MM effective (AI kicking in, ×2.0)
  Phase 2 (M5-7): 15 MM raw → 19 MM effective (full AI velocity, ×2.0)
  Phase 3 (M8-9): 10 MM raw →  6 MM effective (perf/docs, ×1.0)

"Is it enough?"

Module	Estimated Effort	Feasible?
Travel Booking (hardest)	10-12 MM	✅ Phase 1 (3 months, 2 senior engineers + AI)
Event Management	6-8 MM	✅ Phase 1-2 (overlap with Travel tail)
Workforce + Allocation	5-7 MM	✅ Phase 2
Communications (simplest)	3-4 MM	✅ Phase 0 pilot + Phase 2 complete
Reporting (read-only)	3-4 MM	✅ Phase 2
Infra + Platform	6-8 MM	✅ D3 full-time + shared effort
ACL for Payment	2-3 MM	✅ Phase 1
Total	35-46 MM	⚠️ Tight fit at 44 effective

Conclusion: Feasible but no room for error. All scope creep must be blocked aggressively.

5. Risk Matrix Derived From Constraints

Risk	Likelihood	Impact	Constraint Source	Mitigation
Migration causes outage	Medium	Critical	Zero DT × 40K	Strangler Fig, canary, instant rollback
Team burnout	High	High	5 Eng × 9 Months	Aggressive scope control, AI automation, sprint sustainable pace
Payment integration breaks	Low	Critical	Payment frozen	ACL isolation, extensive integration tests
Underestimate Travel complexity	Medium	High	9 Months tight	Start Travel first (hardest), AI legacy analysis
AI tooling doesn't deliver 2x	Medium	High	Capacity dependent	Measure velocity weekly, fallback plan = reduce scope
Key engineer leaves	Low	Critical	5 Eng	Cross-training, documentation, no single-person dependency

6. Assessor Perspective

✅ WANT TO SEE:
  • Clear capacity math (man-months, AI multiplier, overhead)
  • Constraint interactions (40K + zero DT = no maintenance window) 
  • Explicit defer list (doesn't fit 9 months? SAY SO)
  • Payment frozen = scope gift, leverage it
  • Risk-aware: zero downtime with 5 people is hard, acknowledge it

❌ DO NOT WANT TO SEE:
  • "5 engineers is enough because we use AI" — need math, not faith
  • Ignoring zero downtime complexity
  • Promising to deliver all 6 modules + Payment in 9 months
  • Not acknowledging team burnout risk
  • Analyzing constraints in isolation without discussing interactions

7. Summary — What The Constraints Tell Us About The Playing Field

This is an OPTIMIZATION problem under CONSTRAINTS:

  Maximize: number of modules extracted into microservices
  Subject to:
    - Downtime = 0
    - Payment = frozen Phase 1
    - Engineers ≤ 5
    - Time ≤ 9 months
    - Quality ≥ production-grade

Optimal strategy:
  1. Payment frozen → reduced scope → leverage it
  2. AI 2x → increased capacity → exploit fully
  3. Simplest first (Comms) → build the pattern → apply to complex (Travel)
  4. Per-module extraction → Strangler Fig → zero downtime
  5. Defer everything that doesn't fit → say it directly

Constraints are NOT obstacles. Constraints are BOUNDARIES for engineering judgment.
Assessor test: Do you know how to play within boundaries, or do you try to break them?

Constraints Analysis

Constraints Analysis

1. Original Problem Statement

2. Constraint-by-Constraint Analysis — What They Really Mean

2.1 "40,000 active global users"

2.2 "Zero downtime during migration"

2.3 "Payment flow cannot change in Phase 1"

2.4 "5 engineers only"

2.5 "9 months total"

3. Constraint Interaction Matrix — How They Affect Each Other

4. Capacity Math — Man-Month Breakdown

5. Risk Matrix Derived From Constraints

6. Assessor Perspective

7. Summary — What The Constraints Tell Us About The Playing Field

Related Documents

Links to →

← Referenced by