Project Management, Stakeholder Management & Process

Project management, stakeholder management, and engineering process strategy
For a team of 5 engineers, 9 months, legacy .NET → .NET 8 microservices
AI-first approach

1. Project Management Framework

1.1 Why Not Pure Scrum — Hybrid Approach

Framework	Fit?	Reasoning
Pure Scrum	❌	5 people don’t need heavy ceremonies. 4-hour sprint planning for 5 people = waste
Pure Kanban	⚠️	Good for flow, but lacks checkpoints for migration milestones
Shape Up (modified)	✅	6-week cycles + cooldown. Fit cho migration phases. Appetite-based (fixed time, variable scope)
SAFe	❌	Overkill for 5 people. SAFe is for 50+ engineers

Decision: Shape Up modified + Kanban within cycles

┌──────────────────────────────────────────────────────────────┐
│                  PROJECT RHYTHM                               │
│                                                              │
│  9 months = 6 cycles × 6 weeks                              │
│  (aligned with Planning.md phasing)                         │
│                                                              │
│  Cycle 0 (W1-6):   Phase 0 (AI Foundation + Infra + IaC)   │
│  Cycle 1 (W7-12):  Phase 1a (Travel Booking extraction)     │
│  Cycle 2 (W13-18): Phase 1b (Travel go-live + Event start)  │
│  Cycle 3 (W19-24): Phase 2a (Event go-live + Workforce)     │
│  Cycle 4 (W25-30): Phase 2b (Comms + Reporting go-live)     │
│  Cycle 5 (W31-36): Phase 3 (Stabilize + Harden)             │
│                                                              │
│  Each cycle:                                                 │
│  ┌──────────────────────────────────────────────────┐        │
│  │ Week 1:    Shaping (Tech Lead + team define work)│        │
│  │ Week 2-5:  Building (Kanban flow, daily standups)│        │
│  │ Week 6:    Cooldown (retro, tech debt, learning) │        │
│  └──────────────────────────────────────────────────┘        │
└──────────────────────────────────────────────────────────────┘

1.2 Ceremonies (Lean — Respect 5 People Team)

Ceremony	Frequency	Duration	Purpose	Who
Daily Standup	Daily	10 min	Blockers only. No status reporting — use the board	All 5
Cycle Shaping	Every 6 weeks	2 hours	Define appetite, scope bets, assign pitches	Tech Lead + team
Weekly Demo	Weekly	30 min	Show working software to stakeholders	Rotating presenter
Cycle Retro	Every 6 weeks	1 hour	What worked, what didn't, AI effectiveness review	All 5
Architecture Review	Bi-weekly	1 hour	Review ADRs, service boundaries, tech decisions	Tech Lead + senior eng
AI Workflow Check	Weekly	15 min	AI metrics review, prompt calibration, governance check	Tech Lead

Total ceremony time: ~3.5 hours/week — under 10% of working time. The rest = build.

1.3 Task Management

Tool: Linear / GitHub Projects (kanban board)

Board Columns:
┌──────────┬───────────┬───────────┬──────────┬──────────┬──────┐
│ Backlog  │ Shaped    │ Building  │ Review   │ Staging  │ Done │
│          │ (ready)   │ (WIP ≤5)  │ (PR)     │ (testing)│      │
│          │           │           │          │          │      │
│ Unshaped │ Scoped,   │ In active │ CodeRabbit│ QA on    │ In   │
│ ideas    │ estimated,│ development│ + human  │ staging  │ prod │
│          │ assigned  │           │ review   │ env      │      │
└──────────┴───────────┴───────────┴──────────┴──────────┴──────┘

WIP Limit: 5 (1 per engineer max). No multitasking.
           If blocked → swarm (help each other unblock).

Labels:
  🏗️ migration    — module extraction work
  🤖 ai-generated — majority AI-generated code
  🔧 infra        — CI/CD, IaC, DevOps
  🧪 testing      — test creation, contract tests
  📊 reporting    — reporting/analytics specific
  💳 payment-acl  — anything touching payment
  🔴 blocker      — blocking other work

1.4 Risk-Based Milestone Tracking

Instead of tracking by features delivered, track by risks eliminated:

Milestones (Risk Reduction):

M1 (Week 4):  "Can we AI-migrate a service?"
              ✅ Communications service deployed to staging via AI pipeline
              Risk eliminated: AI approach proven or disproven

M2 (Week 8):  "Can we run dual traffic?"
              ✅ API Gateway routing live: Travel via new service
              ✅ Legacy still handling Payment
              Risk eliminated: Strangler Fig pattern validated

M3 (Week 14): "Can services talk to each other?"
              ✅ Travel → Payment ACL working
              ✅ Event Bus messages flowing
              Risk eliminated: Inter-service communication proven

M4 (Week 20): "Can we handle the data?"
              ✅ CDC from legacy → Reporting working
              ✅ Per-service DBs for Travel + Event
              Risk eliminated: Data migration proven

M5 (Week 28): "Can we scale?"
              ✅ 5 services running, auto-scaling
              ✅ Load test passing (simulated 40K users)
              Risk eliminated: Production readiness

M6 (Week 36): "Can we operate?"
              ✅ Monitoring, alerting, runbooks in place
              ✅ Payment migration plan ready (next phase)
              Risk eliminated: Operational readiness

1.5 Estimation — Appetite-Based (Shape Up Style)

No story point estimation. Use appetite — "how much time are we willing to spend on this?"

Appetite	Duration	Example
Small Batch	≤ 1 week	Communications service extraction (AI handles most)
Big Batch	2-4 weeks	Travel Booking full extraction + tests + React pages
Epic	1 cycle (6 weeks)	Event Management + Reporting + related React modules

Rule: If a task exceeds 6 weeks → must be broken down further. No "ongoing" tasks.

2. Team Structure & Roles

2.1 Team Topology

┌──────────────────────────────────────────────────────────────┐
│                   TEAM STRUCTURE (5 engineers)                 │
│                                                              │
│  ┌──────────────────────────────────────────────┐            │
│  │ Tech Lead (1)                                │            │
│  │ • Architecture decisions (ADR owner)         │            │
│  │ • AI workflow design + governance             │            │
│  │ • Stakeholder communication                   │            │
│  │ • Code review (final gate for critical code) │            │
│  │ • Shaping sessions lead                       │            │
│  │ • 50% hands-on coding / 50% leadership        │            │
│  └──────────────────────────────────────────────┘            │
│                                                              │
│  ┌──────────────────────────────────────────────┐            │
│  │ Senior Backend Engineer (1)                   │            │
│  │ • Service extraction lead                     │            │
│  │ • .NET 8 migration specialist                 │            │
│  │ • AI agent power user (Claude Code batch)     │            │
│  │ • Database migration + CDC setup              │            │
│  │ • Code review (business logic)                │            │
│  └──────────────────────────────────────────────┘            │
│                                                              │
│  ┌──────────────────────────────────────────────┐            │
│  │ Backend Engineer (1)                          │            │
│  │ • Service implementation                      │            │
│  │ • Event handlers + Service Bus integration    │            │
│  │ • Contract test authoring (Pact)              │            │
│  │ • ACL development + maintenance               │            │
│  └──────────────────────────────────────────────┘            │
│                                                              │
│  ┌──────────────────────────────────────────────┐            │
│  │ Full-stack Engineer (1)                       │            │
│  │ • React 18 frontend development               │            │
│  │ • Shared design system (Storybook)            │            │
│  │ • API integration (frontend ↔ services)       │            │
│  │ • AI-assisted UI component generation          │            │
│  └──────────────────────────────────────────────┘            │
│                                                              │
│  ┌──────────────────────────────────────────────┐            │
│  │ DevOps/Platform Engineer (1)                  │            │
│  │ • CI/CD pipelines (GitHub Actions)            │            │
│  │ • IaC (Bicep)                                 │            │
│  │ • Azure Container Apps management             │            │
│  │ • Observability stack (monitoring, alerting)  │            │
│  │ • Security scanning integration                │            │
│  └──────────────────────────────────────────────┘            │
│                                                              │
│  ─────────────────────────────────────────────────           │
│  AI Force Multiplier: Each engineer uses AI (Cursor Pro +    │
│  Claude Code). Specialized tasks don’t require specialized    │
│  hires. Fullstack eng uses AI for backend when needed. DevOps │
│  eng uses AI for IaC generation. AI fills the "6th engineer". │
└──────────────────────────────────────────────────────────────┘

2.2 Role Rotation Strategy

With 5 people, bus factor is a critical concern. 1 person leaving = 20% capacity lost.

Strategy	How	Why
Pair on critical modules	2 people know each service. No one solely owns a single service	Bus factor ≥ 2 for every module
Rotate reviewer	Code review rotates — everyone reviews everyone else’s code	Cross-knowledge
AI code walkthrough	Week 1: Senior explains Travel code. Week 2: Backend eng explains Event code...	Everyone understands every service
DevOps cross-train	Every backend eng knows how to deploy their own service. No single DevOps dependency	DevOps doesn’t become a bottleneck

2.3 Collaboration Model

Daily:
  09:00  Standup (10 min) — blockers only
  09:10  Start of deep work — NO meetings zone until 12:00
  14:00  Open for ad-hoc pairing, reviews, discussions
  16:00  Async review — PRs, CodeRabbit comments
  17:00  Claude Code overnight runs scheduled (batch migration tasks)

Weekly:
  Monday AM:    Weekly planning (30 min)
  Wednesday AM: Architecture review (1 hour, bi-weekly)
  Friday PM:    Demo to stakeholders (30 min)
  Friday PM:    AI metrics check (15 min)

Cycle (6 weeks):
  Week 1:      Shaping — define next cycle's bets
  Week 6:      Retro + cooldown + tech debt + learning

3. Stakeholder Management

3.1 Stakeholder Map

┌──────────────────────────────────────────────────────────────────────┐
│                      STAKEHOLDER MAP                                  │
│                                                                      │
│  POWER                                                               │
│    ▲                                                                 │
│    │   ┌─────────────────────┐    ┌──────────────────────────┐      │
│ HIGH│   │ C-Level / Sponsor   │    │ Product Owner /           │      │
│    │   │                     │    │ Business Stakeholders     │      │
│    │   │ Cares about:        │    │                          │      │
│    │   │ • Timeline          │    │ Cares about:             │      │
│    │   │ • Budget            │    │ • Features working       │      │
│    │   │ • Risk              │    │ • Zero downtime          │      │
│    │   │ • AI ROI proof      │    │ • User experience        │      │
│    │   │                     │    │ • Migration transparency │      │
│    │   │ Strategy: MANAGE    │    │                          │      │
│    │   │ CLOSELY             │    │ Strategy: KEEP           │      │
│    │   └─────────────────────┘    │ SATISFIED                │      │
│    │                              └──────────────────────────┘      │
│    │   ┌─────────────────────┐    ┌──────────────────────────┐      │
│ LOW│   │ External API         │    │ End Users (40K)          │      │
│    │   │ Consumers            │    │                          │      │
│    │   │                     │    │ Cares about:             │      │
│    │   │ Cares about:        │    │ • System works           │      │
│    │   │ • API compatibility  │    │ • No disruption          │      │
│    │   │ • Breaking changes   │    │ • Performance            │      │
│    │   │ • Documentation     │    │                          │      │
│    │   │                     │    │ Strategy: MONITOR        │      │
│    │   │ Strategy: KEEP      │    │ (communicate through     │      │
│    │   │ INFORMED             │    │  product channels)       │      │
│    │   └─────────────────────┘    └──────────────────────────┘      │
│    │                                                                 │
│    └──────────────────────────────────────────────────►             │
│                                               INTEREST              │
│         LOW                                   HIGH                  │
└──────────────────────────────────────────────────────────────────────┘

3.2 Communication Plan

Stakeholder	Channel	Frequency	Content	Owner
C-Level / Sponsor	Executive summary (1-page)	Bi-weekly	Risk status, milestone progress, AI ROI metrics, budget burn	Tech Lead
Product Owner	Demo + written update	Weekly	Working features, migration progress, upcoming changes	Tech Lead + rotating eng
Business Users (key)	Change notification	Per migration phase	What's changing, what's not, who to contact for issues	Product Owner (with Tech Lead input)
External API Consumers	API deprecation notice	30 days ahead	Breaking changes, migration guides, new endpoints	Tech Lead
Engineering Team	Standup + board	Daily	In-progress work, blockers, decisions needed	All
Security / Compliance	Audit report	Monthly	SAST results, AI governance compliance, payment module status	DevOps + Tech Lead

3.3 Stakeholder Communication Templates

Bi-weekly Executive Summary (1-page):

┌──────────────────────────────────────────────────────────────┐
│ EXECUTIVE SUMMARY — Week [X] of 36                           │
│                                                              │
│ Overall Status: 🟢 On Track / 🟡 At Risk / 🔴 Blocked       │
│                                                              │
│ ┌────────────────────────────────────────────────────────┐   │
│ │ Milestones                                             │   │
│ │ ✅ M1: AI pipeline validated (Week 4)                  │   │
│ │ ✅ M2: Dual traffic running (Week 8)                   │   │
│ │ 🔄 M3: Inter-service communication (in progress)       │   │
│ │ ⬜ M4: Data migration validated                        │   │
│ │ ⬜ M5: Scale test passed                               │   │
│ │ ⬜ M6: Operational readiness                           │   │
│ └────────────────────────────────────────────────────────┘   │
│                                                              │
│ Key Metrics:                                                 │
│ • Services migrated: 2/5                                     │
│ • API endpoints migrated: 47/120 (39%)                       │
│ • Test coverage (new services): 85%                          │
│ • AI-generated code: 68% (team-reviewed)                     │
│ • Zero downtime incidents: 0                                 │
│                                                              │
│ Top Risks:                                                   │
│ 1. [Risk] — [Mitigation] — [Status]                         │
│                                                              │
│ Decisions Needed:                                            │
│ 1. [Decision] — needed by [date]                             │
│                                                              │
│ Next 2 Weeks:                                                │
│ • [Key deliverable 1]                                        │
│ • [Key deliverable 2]                                        │
└──────────────────────────────────────────────────────────────┘

Weekly Demo Format:

30 minutes max:
  5 min:  Context (what we aimed to do this week)
  15 min: Live demo (working software, not slides)
  5 min:  Metrics (AI productivity, service health)
  5 min:  Q&A + next week preview

Rule: If nothing demoable → show monitoring dashboard,
      test results, or architecture diagram update.
      NEVER skip demo — it builds stakeholder confidence.

3.4 Escalation Path

Issue Severity → Response:

P4 (Low):     Engineer fixes → PR → merge
              No escalation needed

P3 (Medium):  Engineer + Tech Lead discuss
              Fix within cycle
              Mention in weekly update

P2 (High):    Tech Lead decides → immediate fix
              Notify Product Owner same day
              Include in exec summary

P1 (Critical): Zero downtime violated / Payment affected / Data loss
              Tech Lead → Sponsor within 1 hour
              War room (all hands)
              Hourly updates until resolved
              Post-mortem within 48 hours

3.5 Managing Expectations — The "No" Framework

With 5 engineers and 9 months, saying "No" (or "Not now") is the most important skill:

Request Type	Response Framework
"Can we add feature X?"	"Yes, if we defer [Y]. Here's the trade-off."
"Can we speed up?"	"We're at 2x AI capacity. Adding people adds coordination cost. We can re-scope instead."
"Why isn't Payment modernized?"	"By design. Constraint: Payment frozen Phase 1. Plan exists for Phase 2. Here's the ACL keeping it safe."
"Can we skip testing?"	"No. With 75% AI-generated code, testing IS the quality gate. This is non-negotiable."
"Competitor launched feature Z"	"Noted. Added to backlog. Current priority: foundation first. Features after migration."

4. Engineering Process

4.1 Development Lifecycle

┌──────────────────────────────────────────────────────────────────────┐
│                    DEVELOPMENT LIFECYCLE (per task)                    │
│                                                                      │
│  ┌───────────┐    ┌───────────┐    ┌───────────┐    ┌────────────┐  │
│  │ 1. SHAPE  │───►│ 2. BUILD  │───►│ 3. REVIEW │───►│ 4. DEPLOY  │  │
│  └───────────┘    └───────────┘    └───────────┘    └────────────┘  │
│       │                │                │                │           │
│       ▼                ▼                ▼                ▼           │
│  • Define scope   • AI-first dev   • CodeRabbit     • CI pipeline   │
│  • Set appetite   • Cursor Agent     auto-review    • Deploy to     │
│  • Identify risks • Write tests    • Human review     staging       │
│  • Write spec       (TDD with AI)  • Contract test  • Smoke test    │
│  • Assign pair    • Implement        pass           • Manual        │
│                   • AI generates   • Security scan    approval      │
│                     70% of code                     • Prod deploy   │
│                                                       (rolling)     │
└──────────────────────────────────────────────────────────────────────┘

4.2 Definition of Done (DoD)

✅ DEFINITION OF DONE — Every task must meet ALL criteria:

Code:
  □ Feature implemented and builds successfully
  □ AI-generated code reviewed by human (mandatory)
  □ Follows Clean Architecture structure
  □ No TODO/HACK comments left untracked

Testing:
  □ Unit tests pass (≥80% coverage for new code)
  □ Contract tests pass (Pact — for API changes)
  □ Integration tests pass (DB, event bus)
  □ No regression in existing tests

Security:
  □ SAST scan clean (CodeQL)
  □ No secrets in code
  □ Payment-related code: 2 human reviewers approved

Observability:
  □ Structured logging added for key operations
  □ OpenTelemetry trace spans for cross-service calls
  □ Health check endpoint working

Documentation:
  □ API changes reflected in OpenAPI spec
  □ ADR created for architecture decisions
  □ README updated if setup/run instructions changed

Deployment:
  □ Docker image builds successfully
  □ Deployed to staging and tested
  □ Monitoring/alerting configured for new endpoints

4.3 Git Workflow

┌──────────────────────────────────────────────────────────────┐
│                     GIT WORKFLOW                              │
│                                                              │
│  main ─────────────────────────────────────────────►         │
│    │         │              │              │                  │
│    │    merge (squash)  merge (squash)  merge (squash)        │
│    │         ▲              ▲              ▲                  │
│    │         │              │              │                  │
│    ├── feature/travel-booking-extraction ──┘                  │
│    │         │                                               │
│    │    ┌────┴────────────────────────┐                      │
│    │    │ Commits:                    │                      │
│    │    │ feat: scaffold travel svc   │                      │
│    │    │ feat: migrate booking logic │                      │
│    │    │ test: contract tests        │                      │
│    │    │ fix: edge case in pricing   │                      │
│    │    └─────────────────────────────┘                      │
│    │                                                         │
│    ├── feature/event-management-extraction ──────────────┘    │
│    │                                                         │
│    └── feature/infra-cicd-setup ─────────────────────┘       │
│                                                              │
│  Branch Naming:                                              │
│    feature/{module}-{description}                            │
│    fix/{module}-{description}                                │
│    infra/{description}                                       │
│                                                              │
│  Rules:                                                      │
│    • PR required for main (no direct push)                   │
│    • CodeRabbit auto-review on PR create                     │
│    • ≥1 human approval required                              │
│    • Payment-related: ≥2 human approvals                     │
│    • CI must pass (build + test + security scan)             │
│    • Squash merge to main (clean history)                    │
└──────────────────────────────────────────────────────────────┘

4.4 Code Review Process

PR Created
    │
    ▼
┌───────────────────────┐
│ Gate 1: Automated      │
│ • CI build + tests     │
│ • CodeRabbit review    │
│ • SAST scan (CodeQL)   │
│ • Contract test check  │
└───────────┬───────────┘
            │ All pass?
            │
    ┌───────┴───────┐
    │ Yes           │ No → Fix and re-push
    ▼               │
┌───────────────────────┐
│ Gate 2: Human Review   │
│                       │
│ Reviewer focuses on:  │
│ • Business logic      │
│   correctness         │
│ • Edge cases          │
│ • Architecture fit    │
│ • AI hallucination    │
│   detection           │
│ • Performance         │
│   implications        │
│                       │
│ NOT focused on:       │
│ • Formatting (linter) │
│ • Simple bugs         │
│   (CodeRabbit caught) │
│ • Test coverage       │
│   (CI enforced)       │
└───────────┬───────────┘
            │ Approved?
            │
    ┌───────┴───────┐
    │ Standard code  │ Payment/Security code
    ▼               ▼
  1 approval     2 approvals
    │               │
    ▼               ▼
  Merge           Merge

4.5 Incident Management

┌──────────────────────────────────────────────────────────────┐
│                 INCIDENT RESPONSE FLOW                        │
│                                                              │
│  Alert Triggered (Azure Monitor / AI Anomaly Detection)      │
│      │                                                       │
│      ▼                                                       │
│  ┌──────────────────┐                                       │
│  │ Triage (5 min)    │                                       │
│  │ Who: On-call eng  │                                       │
│  │ What: Severity?   │                                       │
│  └────────┬─────────┘                                       │
│           │                                                  │
│  ┌────────┼──────────────────┐                              │
│  │        │                  │                              │
│  ▼        ▼                  ▼                              │
│ P3/P4    P2                 P1                              │
│ Low      High               Critical                        │
│          │                  │                              │
│ Fix in   Immediate fix      War room                        │
│ next     Notify Tech Lead   All hands                       │
│ cycle    Same-day update    Hourly updates                   │
│          to PO              Sponsor notified                 │
│                             │                              │
│                             ▼                              │
│                          ┌──────────────────┐              │
│                          │ Post-Mortem       │              │
│                          │ Within 48 hours   │              │
│                          │ • Timeline        │              │
│                          │ • Root cause      │              │
│                          │ • Impact          │              │
│                          │ • Prevention      │              │
│                          │ • Action items    │              │
│                          └──────────────────┘              │
│                                                              │
│  On-Call Rotation:                                           │
│  Week 1: Tech Lead + Senior BE                               │
│  Week 2: BE Engineer + Full-stack                            │
│  Week 3: DevOps + Tech Lead                                  │
│  (Rotate every week. Always 2 people for coverage.)          │
└──────────────────────────────────────────────────────────────┘

4.6 Architecture Decision Records (ADR) Process

Trigger: Any decision that affects:
  • Service boundaries
  • Database choice/structure
  • Communication patterns
  • Technology selection
  • Security model
  • AI governance rules

Process:
  1. Engineer drafts ADR (AI-assisted — Claude drafts from context)
  2. Tech Lead reviews within 24 hours
  3. Team review in next Architecture Review meeting (bi-weekly)
  4. Accepted / Rejected / Amended
  5. Stored in repo: /docs/adrs/ADR-NNN-title.md

ADR Template:
  ┌──────────────────────────────────────┐
  │ # ADR-NNN: [Title]                   │
  │ Status: Proposed / Accepted / Deprecated
  │ Date: [date]                         │
  │ Deciders: [names]                    │
  │                                      │
  │ ## Context                           │
  │ What is the problem?                 │
  │                                      │
  │ ## Decision                          │
  │ What did we decide?                  │
  │                                      │
  │ ## Options Considered                │
  │ | Option | Pros | Cons |             │
  │                                      │
  │ ## Consequences                      │
  │ What are the trade-offs?             │
  │                                      │
  │ ## AI Involvement                    │
  │ Was AI used to draft? What was       │
  │ human-validated?                     │
  └──────────────────────────────────────┘

5. Release Management

5.1 Release Strategy

┌──────────────────────────────────────────────────────────────┐
│                  RELEASE STRATEGY                             │
│                                                              │
│  Type 1: Service Deployment (frequent)                       │
│  ────────────────────────────────                            │
│  • Every merged PR → auto-deploy to staging                  │
│  • Staging → Production: manual approval (Tech Lead/Senior)  │
│  • Rolling update (zero downtime)                            │
│  • Feature flags for incomplete features                     │
│  • Frequency: 2-3 times per week                             │
│                                                              │
│  Type 2: Module Go-Live (per phase)                          │
│  ────────────────────────────────                            │
│  • Full module cutover: traffic routes from legacy → new     │
│  • Requires: all contract tests pass + load test pass        │
│  • Canary deployment: 5% → 25% → 50% → 100% traffic         │
│  • Rollback plan: YARP route back to legacy < 5 minutes      │
│  • Stakeholder notified 1 week before                        │
│  • Frequency: once per phase (roughly monthly)               │
│                                                              │
│  Type 3: Database Migration (rare, high-risk)                │
│  ────────────────────────────────                            │
│  • Per-service DB cutover                                    │
│  • CDC running for weeks before cutover (verify data sync)   │
│  • Blue-green: new DB live + old DB as fallback              │
│  • Data verification scripts mandatory                       │
│  • Frequency: 1-2 total across 9 months                     │
└──────────────────────────────────────────────────────────────┘

5.2 Canary Release Process (Module Go-Live)

Canary Deployment Flow:

Day 1-2:  5% traffic → new service
          Monitor: error rates, latency, business metrics
          ┌─────────────────────────────────────┐
          │ 95% ──► Legacy                      │
          │  5% ──► New Service                 │
          └─────────────────────────────────────┘

Day 3-4:  25% traffic (if Day 1-2 clean)
          ┌─────────────────────────────────────┐
          │ 75% ──► Legacy                      │
          │ 25% ──► New Service                 │
          └─────────────────────────────────────┘

Day 5:    50% traffic
          ┌─────────────────────────────────────┐
          │ 50% ──► Legacy                      │
          │ 50% ──► New Service                 │
          └─────────────────────────────────────┘

Day 6-7:  100% traffic (full cutover)
          ┌─────────────────────────────────────┐
          │  0% ──► Legacy (standby)            │
          │100% ──► New Service                 │
          └─────────────────────────────────────┘

Day 14:   Legacy module decommissioned
          (after 1 week soak at 100%)

At ANY point: error rate > threshold → automatic rollback to legacy
YARP config change = rollback in < 5 minutes

5.3 Feature Flags

Tool: Azure App Configuration (feature management) 
      or LaunchDarkly (if budget allows)

Usage:
┌──────────────────────────────────────────────────────────────┐
│  Flag Name                 │ Purpose                         │
│  ────────────────────────  │ ──────────────────────────────  │
│  travel.new-service        │ Route traffic to new Travel svc │
│  event.new-service         │ Route traffic to new Event svc  │
│  react.travel-ui           │ Show new React UI for Travel    │
│  react.event-ui            │ Show new React UI for Events    │
│  ai.smart-routing          │ Enable AI-based API routing     │
│  ai.anomaly-detection      │ Enable AI monitoring alerts     │
│  reporting.cqrs-mode       │ Use CQRS read models vs legacy  │
└──────────────────────────────────────────────────────────────┘

Rules:
  • All new services behind feature flags
  • Flags per-tenant and per-region capable
  • Kill switch: disable new service → fallback to legacy instantly
  • Flags reviewed and cleaned up every cycle

6. Quality Assurance Process

6.1 Testing Pyramid

                    ┌───────────┐
                    │  Manual   │  ← Exploratory testing only
                    │  Testing  │     (by team on Friday demo)
                    │  (rare)   │
                    ├───────────┤
                  ┌─┤  E2E      ├─┐  ← Critical paths only
                  │ │  Tests    │ │     (login → book → pay)
                  │ │  (few)    │ │     Cypress/Playwright
                  │ ├───────────┤ │
                ┌─┤ │ Contract  │ ├─┐  ← Every service boundary
                │ │ │ Tests     │ │ │     Pact consumer/provider
                │ │ │ (Pact)    │ │ │     Run in CI every PR
                │ │ ├───────────┤ │ │
              ┌─┤ │ │Integration│ │ ├─┐  ← DB, event bus, external APIs
              │ │ │ │ Tests     │ │ │ │     Docker Compose in CI
              │ │ │ │           │ │ │ │
              │ │ │ ├───────────┤ │ │ │
            ┌─┤ │ │ │  Unit     │ │ │ ├─┐  ← Business logic, domain
            │ │ │ │ │  Tests    │ │ │ │ │     Fast, no external deps
            │ │ │ │ │  (many)   │ │ │ │ │     80%+ coverage target
            └─┴─┴─┴─┴───────────┴─┴─┴─┴─┘

AI Role in Testing:
  • Unit tests: 80% AI-generated (Claude Sonnet 4)
  • Contract tests: 70% AI-generated, human validates contracts
  • Integration tests: 50% AI-generated, human sets up fixtures
  • E2E tests: 30% AI-assisted (Playwright codegen + AI refinement)
  • Manual testing: 0% AI — human exploratory only

6.2 Quality Gates

PR Level:
  □ Build passes
  □ Unit tests pass (≥80% coverage on changed files)
  □ Contract tests pass
  □ SAST clean
  □ CodeRabbit review: no critical findings
  □ Human review: approved

Staging Level:
  □ Integration tests pass
  □ E2E critical paths pass
  □ Performance baseline not degraded (p95 latency)
  □ No new security vulnerabilities

Production Level:
  □ All staging gates pass
  □ Feature flag ready (kill switch)
  □ Monitoring/alerting configured
  □ Rollback plan documented
  □ Tech Lead approval

7. Knowledge Management

7.1 Documentation Strategy

┌──────────────────────────────────────────────────────────────┐
│              DOCUMENTATION HIERARCHY                          │
│                                                              │
│  /docs/                                                      │
│  ├── adrs/                    ← Architecture Decision Records │
│  │   ├── ADR-001-strangler-fig.md                            │
│  │   ├── ADR-002-yarp-gateway.md                             │
│  │   └── ...                                                 │
│  ├── runbooks/                ← Operational runbooks          │
│  │   ├── deploy-service.md                                   │
│  │   ├── rollback-procedure.md                               │
│  │   ├── incident-response.md                                │
│  │   └── database-migration.md                               │
│  ├── api/                     ← OpenAPI specs (auto-generated)│
│  │   ├── travel-api.yaml                                     │
│  │   ├── event-api.yaml                                      │
│  │   └── ...                                                 │
│  ├── onboarding/              ← New team member guides        │
│  │   ├── setup-dev-environment.md                            │
│  │   ├── ai-workflow-guide.md                                │
│  │   └── architecture-overview.md                            │
│  └── migration/               ← Migration-specific docs       │
│      ├── legacy-module-inventory.md  (AI-generated)          │
│      ├── data-migration-plan.md                              │
│      └── cutover-checklist.md                                │
│                                                              │
│  Rule: Docs live with code (in repo).                        │
│        No separate wiki — prevents doc drift.                │
│        AI generates first draft, human reviews.              │
└──────────────────────────────────────────────────────────────┘

7.2 Onboarding (New Engineer Joins Mid-project)

Day 1:
  □ Dev environment setup (AI-assisted — Cursor configured)
  □ Read: architecture-overview.md
  □ Read: ai-workflow-guide.md
  □ Access: all repos, CI/CD, Azure, monitoring dashboards

Day 2-3:
  □ Pair with Senior on current task
  □ Run full test suite locally
  □ Deploy to staging (understand pipeline)
  □ Review 3 recent PRs (understand review culture)

Day 4-5:
  □ First task: small bug fix or test improvement
  □ Full PR flow: code → CodeRabbit → human review → merge
  □ First AI-assisted development task (Cursor Agent mode)

Week 2:
  □ Own a small feature end-to-end
  □ Attend architecture review
  □ AI code walkthrough session

Target: Productive contributor by Day 10.
        AI tools reduce onboarding time by ~40%
        (AI explains codebase, generates boilerplate, catches mistakes early)

8. Metrics & Reporting

8.1 Key Metrics Dashboard

┌──────────────────────────────────────────────────────────────┐
│                    PROJECT HEALTH DASHBOARD                    │
│                                                              │
│  DELIVERY METRICS                                            │
│  ┌──────────────────────────────────────────────────┐        │
│  │ Services migrated:       ████░░░░░░  3/5 (60%)  │        │
│  │ API endpoints migrated:  ██████░░░░  72/120 (60%)│        │
│  │ React modules live:      ███░░░░░░░  2/5 (40%)  │        │
│  │ Cycle progress:          ████████░░  Cycle 4/6   │        │
│  └──────────────────────────────────────────────────┘        │
│                                                              │
│  QUALITY METRICS                                             │
│  ┌──────────────────────────────────────────────────┐        │
│  │ Test coverage (new code):          85%           │        │
│  │ Contract test pass rate:           100%          │        │
│  │ Production incidents:              0 (P1/P2)     │        │
│  │ Zero downtime maintained:          ✅ Yes         │        │
│  │ CodeRabbit acceptance rate:        92%           │        │
│  └──────────────────────────────────────────────────┘        │
│                                                              │
│  AI METRICS                                                  │
│  ┌──────────────────────────────────────────────────┐        │
│  │ AI-generated code ratio:           68%           │        │
│  │ AI code bug rate vs human:         0.8x (lower!) │        │
│  │ AI PR rejection rate:              12%           │        │
│  │ Avg time per service migration:    3.2 weeks     │        │
│  │ AI tool cost (monthly):            $1,050        │        │
│  │ Effective multiplier (measured):   1.9x          │        │
│  └──────────────────────────────────────────────────┘        │
│                                                              │
│  TEAM HEALTH                                                 │
│  ┌──────────────────────────────────────────────────┐        │
│  │ Sprint velocity trend:             ↗ ↗ → →       │        │
│  │ Team satisfaction (retro):         4.2/5          │        │
│  │ Overtime hours this cycle:         2 (acceptable) │        │
│  │ Bus factor (min 2 per service):    ✅ Met          │        │
│  └──────────────────────────────────────────────────┘        │
└──────────────────────────────────────────────────────────────┘

8.2 Reporting Cadence

Report	Audience	Frequency	Content
Health Dashboard	Team	Real-time (board)	All metrics above
Weekly Demo	Product Owner + Business	Weekly	Working features + metrics
Exec Summary	C-Level / Sponsor	Bi-weekly	1-page: milestones, risks, decisions needed
AI ROI Report	Sponsor	Monthly	AI cost vs productivity gain, quality comparison
Cycle Report	All stakeholders	Every 6 weeks	Full cycle review: delivered, deferred, learnings
Post-Mortem	Team + relevant stakeholders	Per P1/P2 incident	RCA, prevention, action items

9. Continuous Improvement

9.1 Retrospective Framework (Cycle-end)

Format: Start / Stop / Continue (30 min)
        + AI-specific section (15 min)
        + Action items (15 min) = 1 hour total

Questions:
  START:
    • What should we start doing?
    • What new AI tools/prompts should we try?
    • What process is missing?

  STOP:
    • What's wasting our time?
    • Which AI patterns aren't working?
    • What ceremonies are useless?

  CONTINUE:
    • What's working well?
    • Which AI workflows are highest ROI?
    • What should we NOT change?

  AI-SPECIFIC:
    • Where did AI help most this cycle?
    • Where did AI cause rework? (hallucination tracking)
    • Is our 2x multiplier holding? Actual measurement?
    • Any prompt library updates needed?
    • AI governance: any near-misses?

Action Items:
  • Max 3 per retro
  • Each has owner + deadline
  • Tracked on board
  • Reviewed next retro

9.2 Learning Budget

Per engineer, per cycle (6 weeks):
  • 4 hours: intentional learning (new tool, new pattern, conference talk)
  • 2 hours: AI experimentation (try new model, new workflow)
  • Cooldown week (Week 6): focus on tech debt, learning, experimentation

Investment: ~6 hours per cycle per person = 1.5% of working time
Return: Keeps team sharp, prevents burnout, discovers better approaches

Project Management