Documents/planning/Project Management

Project Management

Project Management, Stakeholder Management & Process

Project management, stakeholder management, and engineering process strategy
For a team of 5 engineers, 9 months, legacy .NET → .NET 8 microservices
AI-first approach


1. Project Management Framework

1.1 Why Not Pure Scrum — Hybrid Approach

Framework Fit? Reasoning
Pure Scrum 5 people don’t need heavy ceremonies. 4-hour sprint planning for 5 people = waste
Pure Kanban ⚠️ Good for flow, but lacks checkpoints for migration milestones
Shape Up (modified) 6-week cycles + cooldown. Fit cho migration phases. Appetite-based (fixed time, variable scope)
SAFe Overkill for 5 people. SAFe is for 50+ engineers

Decision: Shape Up modified + Kanban within cycles

┌──────────────────────────────────────────────────────────────┐
│                  PROJECT RHYTHM                               │
│                                                              │
│  9 months = 6 cycles × 6 weeks                              │
│  (aligned with Planning.md phasing)                         │
│                                                              │
│  Cycle 0 (W1-6):   Phase 0 (AI Foundation + Infra + IaC)   │
│  Cycle 1 (W7-12):  Phase 1a (Travel Booking extraction)     │
│  Cycle 2 (W13-18): Phase 1b (Travel go-live + Event start)  │
│  Cycle 3 (W19-24): Phase 2a (Event go-live + Workforce)     │
│  Cycle 4 (W25-30): Phase 2b (Comms + Reporting go-live)     │
│  Cycle 5 (W31-36): Phase 3 (Stabilize + Harden)             │
│                                                              │
│  Each cycle:                                                 │
│  ┌──────────────────────────────────────────────────┐        │
│  │ Week 1:    Shaping (Tech Lead + team define work)│        │
│  │ Week 2-5:  Building (Kanban flow, daily standups)│        │
│  │ Week 6:    Cooldown (retro, tech debt, learning) │        │
│  └──────────────────────────────────────────────────┘        │
└──────────────────────────────────────────────────────────────┘

1.2 Ceremonies (Lean — Respect 5 People Team)

Ceremony Frequency Duration Purpose Who
Daily Standup Daily 10 min Blockers only. No status reporting — use the board All 5
Cycle Shaping Every 6 weeks 2 hours Define appetite, scope bets, assign pitches Tech Lead + team
Weekly Demo Weekly 30 min Show working software to stakeholders Rotating presenter
Cycle Retro Every 6 weeks 1 hour What worked, what didn't, AI effectiveness review All 5
Architecture Review Bi-weekly 1 hour Review ADRs, service boundaries, tech decisions Tech Lead + senior eng
AI Workflow Check Weekly 15 min AI metrics review, prompt calibration, governance check Tech Lead

Total ceremony time: ~3.5 hours/week — under 10% of working time. The rest = build.

1.3 Task Management

Tool: Linear / GitHub Projects (kanban board)

Board Columns:
┌──────────┬───────────┬───────────┬──────────┬──────────┬──────┐
│ Backlog  │ Shaped    │ Building  │ Review   │ Staging  │ Done │
│          │ (ready)   │ (WIP ≤5)  │ (PR)     │ (testing)│      │
│          │           │           │          │          │      │
│ Unshaped │ Scoped,   │ In active │ CodeRabbit│ QA on    │ In   │
│ ideas    │ estimated,│ development│ + human  │ staging  │ prod │
│          │ assigned  │           │ review   │ env      │      │
└──────────┴───────────┴───────────┴──────────┴──────────┴──────┘

WIP Limit: 5 (1 per engineer max). No multitasking.
           If blocked → swarm (help each other unblock).

Labels:
  🏗️ migration    — module extraction work
  🤖 ai-generated — majority AI-generated code
  🔧 infra        — CI/CD, IaC, DevOps
  🧪 testing      — test creation, contract tests
  📊 reporting    — reporting/analytics specific
  💳 payment-acl  — anything touching payment
  🔴 blocker      — blocking other work

1.4 Risk-Based Milestone Tracking

Instead of tracking by features delivered, track by risks eliminated:

Milestones (Risk Reduction):

M1 (Week 4):  "Can we AI-migrate a service?"
              ✅ Communications service deployed to staging via AI pipeline
              Risk eliminated: AI approach proven or disproven

M2 (Week 8):  "Can we run dual traffic?"
              ✅ API Gateway routing live: Travel via new service
              ✅ Legacy still handling Payment
              Risk eliminated: Strangler Fig pattern validated

M3 (Week 14): "Can services talk to each other?"
              ✅ Travel → Payment ACL working
              ✅ Event Bus messages flowing
              Risk eliminated: Inter-service communication proven

M4 (Week 20): "Can we handle the data?"
              ✅ CDC from legacy → Reporting working
              ✅ Per-service DBs for Travel + Event
              Risk eliminated: Data migration proven

M5 (Week 28): "Can we scale?"
              ✅ 5 services running, auto-scaling
              ✅ Load test passing (simulated 40K users)
              Risk eliminated: Production readiness

M6 (Week 36): "Can we operate?"
              ✅ Monitoring, alerting, runbooks in place
              ✅ Payment migration plan ready (next phase)
              Risk eliminated: Operational readiness

1.5 Estimation — Appetite-Based (Shape Up Style)

No story point estimation. Use appetite — "how much time are we willing to spend on this?"

Appetite Duration Example
Small Batch ≤ 1 week Communications service extraction (AI handles most)
Big Batch 2-4 weeks Travel Booking full extraction + tests + React pages
Epic 1 cycle (6 weeks) Event Management + Reporting + related React modules

Rule: If a task exceeds 6 weeks → must be broken down further. No "ongoing" tasks.


2. Team Structure & Roles

2.1 Team Topology

┌──────────────────────────────────────────────────────────────┐
│                   TEAM STRUCTURE (5 engineers)                 │
│                                                              │
│  ┌──────────────────────────────────────────────┐            │
│  │ Tech Lead (1)                                │            │
│  │ • Architecture decisions (ADR owner)         │            │
│  │ • AI workflow design + governance             │            │
│  │ • Stakeholder communication                   │            │
│  │ • Code review (final gate for critical code) │            │
│  │ • Shaping sessions lead                       │            │
│  │ • 50% hands-on coding / 50% leadership        │            │
│  └──────────────────────────────────────────────┘            │
│                                                              │
│  ┌──────────────────────────────────────────────┐            │
│  │ Senior Backend Engineer (1)                   │            │
│  │ • Service extraction lead                     │            │
│  │ • .NET 8 migration specialist                 │            │
│  │ • AI agent power user (Claude Code batch)     │            │
│  │ • Database migration + CDC setup              │            │
│  │ • Code review (business logic)                │            │
│  └──────────────────────────────────────────────┘            │
│                                                              │
│  ┌──────────────────────────────────────────────┐            │
│  │ Backend Engineer (1)                          │            │
│  │ • Service implementation                      │            │
│  │ • Event handlers + Service Bus integration    │            │
│  │ • Contract test authoring (Pact)              │            │
│  │ • ACL development + maintenance               │            │
│  └──────────────────────────────────────────────┘            │
│                                                              │
│  ┌──────────────────────────────────────────────┐            │
│  │ Full-stack Engineer (1)                       │            │
│  │ • React 18 frontend development               │            │
│  │ • Shared design system (Storybook)            │            │
│  │ • API integration (frontend ↔ services)       │            │
│  │ • AI-assisted UI component generation          │            │
│  └──────────────────────────────────────────────┘            │
│                                                              │
│  ┌──────────────────────────────────────────────┐            │
│  │ DevOps/Platform Engineer (1)                  │            │
│  │ • CI/CD pipelines (GitHub Actions)            │            │
│  │ • IaC (Bicep)                                 │            │
│  │ • Azure Container Apps management             │            │
│  │ • Observability stack (monitoring, alerting)  │            │
│  │ • Security scanning integration                │            │
│  └──────────────────────────────────────────────┘            │
│                                                              │
│  ─────────────────────────────────────────────────           │
│  AI Force Multiplier: Each engineer uses AI (Cursor Pro +    │
│  Claude Code). Specialized tasks don’t require specialized    │
│  hires. Fullstack eng uses AI for backend when needed. DevOps │
│  eng uses AI for IaC generation. AI fills the "6th engineer". │
└──────────────────────────────────────────────────────────────┘

2.2 Role Rotation Strategy

With 5 people, bus factor is a critical concern. 1 person leaving = 20% capacity lost.

Strategy How Why
Pair on critical modules 2 people know each service. No one solely owns a single service Bus factor ≥ 2 for every module
Rotate reviewer Code review rotates — everyone reviews everyone else’s code Cross-knowledge
AI code walkthrough Week 1: Senior explains Travel code. Week 2: Backend eng explains Event code... Everyone understands every service
DevOps cross-train Every backend eng knows how to deploy their own service. No single DevOps dependency DevOps doesn’t become a bottleneck

2.3 Collaboration Model

Daily:
  09:00  Standup (10 min) — blockers only
  09:10  Start of deep work — NO meetings zone until 12:00
  14:00  Open for ad-hoc pairing, reviews, discussions
  16:00  Async review — PRs, CodeRabbit comments
  17:00  Claude Code overnight runs scheduled (batch migration tasks)

Weekly:
  Monday AM:    Weekly planning (30 min)
  Wednesday AM: Architecture review (1 hour, bi-weekly)
  Friday PM:    Demo to stakeholders (30 min)
  Friday PM:    AI metrics check (15 min)

Cycle (6 weeks):
  Week 1:      Shaping — define next cycle's bets
  Week 6:      Retro + cooldown + tech debt + learning

3. Stakeholder Management

3.1 Stakeholder Map

┌──────────────────────────────────────────────────────────────────────┐
│                      STAKEHOLDER MAP                                  │
│                                                                      │
│  POWER                                                               │
│    ▲                                                                 │
│    │   ┌─────────────────────┐    ┌──────────────────────────┐      │
│ HIGH│   │ C-Level / Sponsor   │    │ Product Owner /           │      │
│    │   │                     │    │ Business Stakeholders     │      │
│    │   │ Cares about:        │    │                          │      │
│    │   │ • Timeline          │    │ Cares about:             │      │
│    │   │ • Budget            │    │ • Features working       │      │
│    │   │ • Risk              │    │ • Zero downtime          │      │
│    │   │ • AI ROI proof      │    │ • User experience        │      │
│    │   │                     │    │ • Migration transparency │      │
│    │   │ Strategy: MANAGE    │    │                          │      │
│    │   │ CLOSELY             │    │ Strategy: KEEP           │      │
│    │   └─────────────────────┘    │ SATISFIED                │      │
│    │                              └──────────────────────────┘      │
│    │   ┌─────────────────────┐    ┌──────────────────────────┐      │
│ LOW│   │ External API         │    │ End Users (40K)          │      │
│    │   │ Consumers            │    │                          │      │
│    │   │                     │    │ Cares about:             │      │
│    │   │ Cares about:        │    │ • System works           │      │
│    │   │ • API compatibility  │    │ • No disruption          │      │
│    │   │ • Breaking changes   │    │ • Performance            │      │
│    │   │ • Documentation     │    │                          │      │
│    │   │                     │    │ Strategy: MONITOR        │      │
│    │   │ Strategy: KEEP      │    │ (communicate through     │      │
│    │   │ INFORMED             │    │  product channels)       │      │
│    │   └─────────────────────┘    └──────────────────────────┘      │
│    │                                                                 │
│    └──────────────────────────────────────────────────►             │
│                                               INTEREST              │
│         LOW                                   HIGH                  │
└──────────────────────────────────────────────────────────────────────┘

3.2 Communication Plan

Stakeholder Channel Frequency Content Owner
C-Level / Sponsor Executive summary (1-page) Bi-weekly Risk status, milestone progress, AI ROI metrics, budget burn Tech Lead
Product Owner Demo + written update Weekly Working features, migration progress, upcoming changes Tech Lead + rotating eng
Business Users (key) Change notification Per migration phase What's changing, what's not, who to contact for issues Product Owner (with Tech Lead input)
External API Consumers API deprecation notice 30 days ahead Breaking changes, migration guides, new endpoints Tech Lead
Engineering Team Standup + board Daily In-progress work, blockers, decisions needed All
Security / Compliance Audit report Monthly SAST results, AI governance compliance, payment module status DevOps + Tech Lead

3.3 Stakeholder Communication Templates

Bi-weekly Executive Summary (1-page):

┌──────────────────────────────────────────────────────────────┐
│ EXECUTIVE SUMMARY — Week [X] of 36                           │
│                                                              │
│ Overall Status: 🟢 On Track / 🟡 At Risk / 🔴 Blocked       │
│                                                              │
│ ┌────────────────────────────────────────────────────────┐   │
│ │ Milestones                                             │   │
│ │ ✅ M1: AI pipeline validated (Week 4)                  │   │
│ │ ✅ M2: Dual traffic running (Week 8)                   │   │
│ │ 🔄 M3: Inter-service communication (in progress)       │   │
│ │ ⬜ M4: Data migration validated                        │   │
│ │ ⬜ M5: Scale test passed                               │   │
│ │ ⬜ M6: Operational readiness                           │   │
│ └────────────────────────────────────────────────────────┘   │
│                                                              │
│ Key Metrics:                                                 │
│ • Services migrated: 2/5                                     │
│ • API endpoints migrated: 47/120 (39%)                       │
│ • Test coverage (new services): 85%                          │
│ • AI-generated code: 68% (team-reviewed)                     │
│ • Zero downtime incidents: 0                                 │
│                                                              │
│ Top Risks:                                                   │
│ 1. [Risk] — [Mitigation] — [Status]                         │
│                                                              │
│ Decisions Needed:                                            │
│ 1. [Decision] — needed by [date]                             │
│                                                              │
│ Next 2 Weeks:                                                │
│ • [Key deliverable 1]                                        │
│ • [Key deliverable 2]                                        │
└──────────────────────────────────────────────────────────────┘

Weekly Demo Format:

30 minutes max:
  5 min:  Context (what we aimed to do this week)
  15 min: Live demo (working software, not slides)
  5 min:  Metrics (AI productivity, service health)
  5 min:  Q&A + next week preview

Rule: If nothing demoable → show monitoring dashboard,
      test results, or architecture diagram update.
      NEVER skip demo — it builds stakeholder confidence.

3.4 Escalation Path

Issue Severity → Response:

P4 (Low):     Engineer fixes → PR → merge
              No escalation needed

P3 (Medium):  Engineer + Tech Lead discuss
              Fix within cycle
              Mention in weekly update

P2 (High):    Tech Lead decides → immediate fix
              Notify Product Owner same day
              Include in exec summary

P1 (Critical): Zero downtime violated / Payment affected / Data loss
              Tech Lead → Sponsor within 1 hour
              War room (all hands)
              Hourly updates until resolved
              Post-mortem within 48 hours

3.5 Managing Expectations — The "No" Framework

With 5 engineers and 9 months, saying "No" (or "Not now") is the most important skill:

Request Type Response Framework
"Can we add feature X?" "Yes, if we defer [Y]. Here's the trade-off."
"Can we speed up?" "We're at 2x AI capacity. Adding people adds coordination cost. We can re-scope instead."
"Why isn't Payment modernized?" "By design. Constraint: Payment frozen Phase 1. Plan exists for Phase 2. Here's the ACL keeping it safe."
"Can we skip testing?" "No. With 75% AI-generated code, testing IS the quality gate. This is non-negotiable."
"Competitor launched feature Z" "Noted. Added to backlog. Current priority: foundation first. Features after migration."

4. Engineering Process

4.1 Development Lifecycle

┌──────────────────────────────────────────────────────────────────────┐
│                    DEVELOPMENT LIFECYCLE (per task)                    │
│                                                                      │
│  ┌───────────┐    ┌───────────┐    ┌───────────┐    ┌────────────┐  │
│  │ 1. SHAPE  │───►│ 2. BUILD  │───►│ 3. REVIEW │───►│ 4. DEPLOY  │  │
│  └───────────┘    └───────────┘    └───────────┘    └────────────┘  │
│       │                │                │                │           │
│       ▼                ▼                ▼                ▼           │
│  • Define scope   • AI-first dev   • CodeRabbit     • CI pipeline   │
│  • Set appetite   • Cursor Agent     auto-review    • Deploy to     │
│  • Identify risks • Write tests    • Human review     staging       │
│  • Write spec       (TDD with AI)  • Contract test  • Smoke test    │
│  • Assign pair    • Implement        pass           • Manual        │
│                   • AI generates   • Security scan    approval      │
│                     70% of code                     • Prod deploy   │
│                                                       (rolling)     │
└──────────────────────────────────────────────────────────────────────┘

4.2 Definition of Done (DoD)

✅ DEFINITION OF DONE — Every task must meet ALL criteria:

Code:
  □ Feature implemented and builds successfully
  □ AI-generated code reviewed by human (mandatory)
  □ Follows Clean Architecture structure
  □ No TODO/HACK comments left untracked

Testing:
  □ Unit tests pass (≥80% coverage for new code)
  □ Contract tests pass (Pact — for API changes)
  □ Integration tests pass (DB, event bus)
  □ No regression in existing tests

Security:
  □ SAST scan clean (CodeQL)
  □ No secrets in code
  □ Payment-related code: 2 human reviewers approved

Observability:
  □ Structured logging added for key operations
  □ OpenTelemetry trace spans for cross-service calls
  □ Health check endpoint working

Documentation:
  □ API changes reflected in OpenAPI spec
  □ ADR created for architecture decisions
  □ README updated if setup/run instructions changed

Deployment:
  □ Docker image builds successfully
  □ Deployed to staging and tested
  □ Monitoring/alerting configured for new endpoints

4.3 Git Workflow

┌──────────────────────────────────────────────────────────────┐
│                     GIT WORKFLOW                              │
│                                                              │
│  main ─────────────────────────────────────────────►         │
│    │         │              │              │                  │
│    │    merge (squash)  merge (squash)  merge (squash)        │
│    │         ▲              ▲              ▲                  │
│    │         │              │              │                  │
│    ├── feature/travel-booking-extraction ──┘                  │
│    │         │                                               │
│    │    ┌────┴────────────────────────┐                      │
│    │    │ Commits:                    │                      │
│    │    │ feat: scaffold travel svc   │                      │
│    │    │ feat: migrate booking logic │                      │
│    │    │ test: contract tests        │                      │
│    │    │ fix: edge case in pricing   │                      │
│    │    └─────────────────────────────┘                      │
│    │                                                         │
│    ├── feature/event-management-extraction ──────────────┘    │
│    │                                                         │
│    └── feature/infra-cicd-setup ─────────────────────┘       │
│                                                              │
│  Branch Naming:                                              │
│    feature/{module}-{description}                            │
│    fix/{module}-{description}                                │
│    infra/{description}                                       │
│                                                              │
│  Rules:                                                      │
│    • PR required for main (no direct push)                   │
│    • CodeRabbit auto-review on PR create                     │
│    • ≥1 human approval required                              │
│    • Payment-related: ≥2 human approvals                     │
│    • CI must pass (build + test + security scan)             │
│    • Squash merge to main (clean history)                    │
└──────────────────────────────────────────────────────────────┘

4.4 Code Review Process

PR Created
    │
    ▼
┌───────────────────────┐
│ Gate 1: Automated      │
│ • CI build + tests     │
│ • CodeRabbit review    │
│ • SAST scan (CodeQL)   │
│ • Contract test check  │
└───────────┬───────────┘
            │ All pass?
            │
    ┌───────┴───────┐
    │ Yes           │ No → Fix and re-push
    ▼               │
┌───────────────────────┐
│ Gate 2: Human Review   │
│                       │
│ Reviewer focuses on:  │
│ • Business logic      │
│   correctness         │
│ • Edge cases          │
│ • Architecture fit    │
│ • AI hallucination    │
│   detection           │
│ • Performance         │
│   implications        │
│                       │
│ NOT focused on:       │
│ • Formatting (linter) │
│ • Simple bugs         │
│   (CodeRabbit caught) │
│ • Test coverage       │
│   (CI enforced)       │
└───────────┬───────────┘
            │ Approved?
            │
    ┌───────┴───────┐
    │ Standard code  │ Payment/Security code
    ▼               ▼
  1 approval     2 approvals
    │               │
    ▼               ▼
  Merge           Merge

4.5 Incident Management

┌──────────────────────────────────────────────────────────────┐
│                 INCIDENT RESPONSE FLOW                        │
│                                                              │
│  Alert Triggered (Azure Monitor / AI Anomaly Detection)      │
│      │                                                       │
│      ▼                                                       │
│  ┌──────────────────┐                                       │
│  │ Triage (5 min)    │                                       │
│  │ Who: On-call eng  │                                       │
│  │ What: Severity?   │                                       │
│  └────────┬─────────┘                                       │
│           │                                                  │
│  ┌────────┼──────────────────┐                              │
│  │        │                  │                              │
│  ▼        ▼                  ▼                              │
│ P3/P4    P2                 P1                              │
│ Low      High               Critical                        │
│          │                  │                              │
│ Fix in   Immediate fix      War room                        │
│ next     Notify Tech Lead   All hands                       │
│ cycle    Same-day update    Hourly updates                   │
│          to PO              Sponsor notified                 │
│                             │                              │
│                             ▼                              │
│                          ┌──────────────────┐              │
│                          │ Post-Mortem       │              │
│                          │ Within 48 hours   │              │
│                          │ • Timeline        │              │
│                          │ • Root cause      │              │
│                          │ • Impact          │              │
│                          │ • Prevention      │              │
│                          │ • Action items    │              │
│                          └──────────────────┘              │
│                                                              │
│  On-Call Rotation:                                           │
│  Week 1: Tech Lead + Senior BE                               │
│  Week 2: BE Engineer + Full-stack                            │
│  Week 3: DevOps + Tech Lead                                  │
│  (Rotate every week. Always 2 people for coverage.)          │
└──────────────────────────────────────────────────────────────┘

4.6 Architecture Decision Records (ADR) Process

Trigger: Any decision that affects:
  • Service boundaries
  • Database choice/structure
  • Communication patterns
  • Technology selection
  • Security model
  • AI governance rules

Process:
  1. Engineer drafts ADR (AI-assisted — Claude drafts from context)
  2. Tech Lead reviews within 24 hours
  3. Team review in next Architecture Review meeting (bi-weekly)
  4. Accepted / Rejected / Amended
  5. Stored in repo: /docs/adrs/ADR-NNN-title.md

ADR Template:
  ┌──────────────────────────────────────┐
  │ # ADR-NNN: [Title]                   │
  │ Status: Proposed / Accepted / Deprecated
  │ Date: [date]                         │
  │ Deciders: [names]                    │
  │                                      │
  │ ## Context                           │
  │ What is the problem?                 │
  │                                      │
  │ ## Decision                          │
  │ What did we decide?                  │
  │                                      │
  │ ## Options Considered                │
  │ | Option | Pros | Cons |             │
  │                                      │
  │ ## Consequences                      │
  │ What are the trade-offs?             │
  │                                      │
  │ ## AI Involvement                    │
  │ Was AI used to draft? What was       │
  │ human-validated?                     │
  └──────────────────────────────────────┘

5. Release Management

5.1 Release Strategy

┌──────────────────────────────────────────────────────────────┐
│                  RELEASE STRATEGY                             │
│                                                              │
│  Type 1: Service Deployment (frequent)                       │
│  ────────────────────────────────                            │
│  • Every merged PR → auto-deploy to staging                  │
│  • Staging → Production: manual approval (Tech Lead/Senior)  │
│  • Rolling update (zero downtime)                            │
│  • Feature flags for incomplete features                     │
│  • Frequency: 2-3 times per week                             │
│                                                              │
│  Type 2: Module Go-Live (per phase)                          │
│  ────────────────────────────────                            │
│  • Full module cutover: traffic routes from legacy → new     │
│  • Requires: all contract tests pass + load test pass        │
│  • Canary deployment: 5% → 25% → 50% → 100% traffic         │
│  • Rollback plan: YARP route back to legacy < 5 minutes      │
│  • Stakeholder notified 1 week before                        │
│  • Frequency: once per phase (roughly monthly)               │
│                                                              │
│  Type 3: Database Migration (rare, high-risk)                │
│  ────────────────────────────────                            │
│  • Per-service DB cutover                                    │
│  • CDC running for weeks before cutover (verify data sync)   │
│  • Blue-green: new DB live + old DB as fallback              │
│  • Data verification scripts mandatory                       │
│  • Frequency: 1-2 total across 9 months                     │
└──────────────────────────────────────────────────────────────┘

5.2 Canary Release Process (Module Go-Live)

Canary Deployment Flow:

Day 1-2:  5% traffic → new service
          Monitor: error rates, latency, business metrics
          ┌─────────────────────────────────────┐
          │ 95% ──► Legacy                      │
          │  5% ──► New Service                 │
          └─────────────────────────────────────┘

Day 3-4:  25% traffic (if Day 1-2 clean)
          ┌─────────────────────────────────────┐
          │ 75% ──► Legacy                      │
          │ 25% ──► New Service                 │
          └─────────────────────────────────────┘

Day 5:    50% traffic
          ┌─────────────────────────────────────┐
          │ 50% ──► Legacy                      │
          │ 50% ──► New Service                 │
          └─────────────────────────────────────┘

Day 6-7:  100% traffic (full cutover)
          ┌─────────────────────────────────────┐
          │  0% ──► Legacy (standby)            │
          │100% ──► New Service                 │
          └─────────────────────────────────────┘

Day 14:   Legacy module decommissioned
          (after 1 week soak at 100%)

At ANY point: error rate > threshold → automatic rollback to legacy
YARP config change = rollback in < 5 minutes

5.3 Feature Flags

Tool: Azure App Configuration (feature management) 
      or LaunchDarkly (if budget allows)

Usage:
┌──────────────────────────────────────────────────────────────┐
│  Flag Name                 │ Purpose                         │
│  ────────────────────────  │ ──────────────────────────────  │
│  travel.new-service        │ Route traffic to new Travel svc │
│  event.new-service         │ Route traffic to new Event svc  │
│  react.travel-ui           │ Show new React UI for Travel    │
│  react.event-ui            │ Show new React UI for Events    │
│  ai.smart-routing          │ Enable AI-based API routing     │
│  ai.anomaly-detection      │ Enable AI monitoring alerts     │
│  reporting.cqrs-mode       │ Use CQRS read models vs legacy  │
└──────────────────────────────────────────────────────────────┘

Rules:
  • All new services behind feature flags
  • Flags per-tenant and per-region capable
  • Kill switch: disable new service → fallback to legacy instantly
  • Flags reviewed and cleaned up every cycle

6. Quality Assurance Process

6.1 Testing Pyramid

                    ┌───────────┐
                    │  Manual   │  ← Exploratory testing only
                    │  Testing  │     (by team on Friday demo)
                    │  (rare)   │
                    ├───────────┤
                  ┌─┤  E2E      ├─┐  ← Critical paths only
                  │ │  Tests    │ │     (login → book → pay)
                  │ │  (few)    │ │     Cypress/Playwright
                  │ ├───────────┤ │
                ┌─┤ │ Contract  │ ├─┐  ← Every service boundary
                │ │ │ Tests     │ │ │     Pact consumer/provider
                │ │ │ (Pact)    │ │ │     Run in CI every PR
                │ │ ├───────────┤ │ │
              ┌─┤ │ │Integration│ │ ├─┐  ← DB, event bus, external APIs
              │ │ │ │ Tests     │ │ │ │     Docker Compose in CI
              │ │ │ │           │ │ │ │
              │ │ │ ├───────────┤ │ │ │
            ┌─┤ │ │ │  Unit     │ │ │ ├─┐  ← Business logic, domain
            │ │ │ │ │  Tests    │ │ │ │ │     Fast, no external deps
            │ │ │ │ │  (many)   │ │ │ │ │     80%+ coverage target
            └─┴─┴─┴─┴───────────┴─┴─┴─┴─┘

AI Role in Testing:
  • Unit tests: 80% AI-generated (Claude Sonnet 4)
  • Contract tests: 70% AI-generated, human validates contracts
  • Integration tests: 50% AI-generated, human sets up fixtures
  • E2E tests: 30% AI-assisted (Playwright codegen + AI refinement)
  • Manual testing: 0% AI — human exploratory only

6.2 Quality Gates

PR Level:
  □ Build passes
  □ Unit tests pass (≥80% coverage on changed files)
  □ Contract tests pass
  □ SAST clean
  □ CodeRabbit review: no critical findings
  □ Human review: approved

Staging Level:
  □ Integration tests pass
  □ E2E critical paths pass
  □ Performance baseline not degraded (p95 latency)
  □ No new security vulnerabilities

Production Level:
  □ All staging gates pass
  □ Feature flag ready (kill switch)
  □ Monitoring/alerting configured
  □ Rollback plan documented
  □ Tech Lead approval

7. Knowledge Management

7.1 Documentation Strategy

┌──────────────────────────────────────────────────────────────┐
│              DOCUMENTATION HIERARCHY                          │
│                                                              │
│  /docs/                                                      │
│  ├── adrs/                    ← Architecture Decision Records │
│  │   ├── ADR-001-strangler-fig.md                            │
│  │   ├── ADR-002-yarp-gateway.md                             │
│  │   └── ...                                                 │
│  ├── runbooks/                ← Operational runbooks          │
│  │   ├── deploy-service.md                                   │
│  │   ├── rollback-procedure.md                               │
│  │   ├── incident-response.md                                │
│  │   └── database-migration.md                               │
│  ├── api/                     ← OpenAPI specs (auto-generated)│
│  │   ├── travel-api.yaml                                     │
│  │   ├── event-api.yaml                                      │
│  │   └── ...                                                 │
│  ├── onboarding/              ← New team member guides        │
│  │   ├── setup-dev-environment.md                            │
│  │   ├── ai-workflow-guide.md                                │
│  │   └── architecture-overview.md                            │
│  └── migration/               ← Migration-specific docs       │
│      ├── legacy-module-inventory.md  (AI-generated)          │
│      ├── data-migration-plan.md                              │
│      └── cutover-checklist.md                                │
│                                                              │
│  Rule: Docs live with code (in repo).                        │
│        No separate wiki — prevents doc drift.                │
│        AI generates first draft, human reviews.              │
└──────────────────────────────────────────────────────────────┘

7.2 Onboarding (New Engineer Joins Mid-project)

Day 1:
  □ Dev environment setup (AI-assisted — Cursor configured)
  □ Read: architecture-overview.md
  □ Read: ai-workflow-guide.md
  □ Access: all repos, CI/CD, Azure, monitoring dashboards

Day 2-3:
  □ Pair with Senior on current task
  □ Run full test suite locally
  □ Deploy to staging (understand pipeline)
  □ Review 3 recent PRs (understand review culture)

Day 4-5:
  □ First task: small bug fix or test improvement
  □ Full PR flow: code → CodeRabbit → human review → merge
  □ First AI-assisted development task (Cursor Agent mode)

Week 2:
  □ Own a small feature end-to-end
  □ Attend architecture review
  □ AI code walkthrough session

Target: Productive contributor by Day 10.
        AI tools reduce onboarding time by ~40%
        (AI explains codebase, generates boilerplate, catches mistakes early)

8. Metrics & Reporting

8.1 Key Metrics Dashboard

┌──────────────────────────────────────────────────────────────┐
│                    PROJECT HEALTH DASHBOARD                    │
│                                                              │
│  DELIVERY METRICS                                            │
│  ┌──────────────────────────────────────────────────┐        │
│  │ Services migrated:       ████░░░░░░  3/5 (60%)  │        │
│  │ API endpoints migrated:  ██████░░░░  72/120 (60%)│        │
│  │ React modules live:      ███░░░░░░░  2/5 (40%)  │        │
│  │ Cycle progress:          ████████░░  Cycle 4/6   │        │
│  └──────────────────────────────────────────────────┘        │
│                                                              │
│  QUALITY METRICS                                             │
│  ┌──────────────────────────────────────────────────┐        │
│  │ Test coverage (new code):          85%           │        │
│  │ Contract test pass rate:           100%          │        │
│  │ Production incidents:              0 (P1/P2)     │        │
│  │ Zero downtime maintained:          ✅ Yes         │        │
│  │ CodeRabbit acceptance rate:        92%           │        │
│  └──────────────────────────────────────────────────┘        │
│                                                              │
│  AI METRICS                                                  │
│  ┌──────────────────────────────────────────────────┐        │
│  │ AI-generated code ratio:           68%           │        │
│  │ AI code bug rate vs human:         0.8x (lower!) │        │
│  │ AI PR rejection rate:              12%           │        │
│  │ Avg time per service migration:    3.2 weeks     │        │
│  │ AI tool cost (monthly):            $1,050        │        │
│  │ Effective multiplier (measured):   1.9x          │        │
│  └──────────────────────────────────────────────────┘        │
│                                                              │
│  TEAM HEALTH                                                 │
│  ┌──────────────────────────────────────────────────┐        │
│  │ Sprint velocity trend:             ↗ ↗ → →       │        │
│  │ Team satisfaction (retro):         4.2/5          │        │
│  │ Overtime hours this cycle:         2 (acceptable) │        │
│  │ Bus factor (min 2 per service):    ✅ Met          │        │
│  └──────────────────────────────────────────────────┘        │
└──────────────────────────────────────────────────────────────┘

8.2 Reporting Cadence

Report Audience Frequency Content
Health Dashboard Team Real-time (board) All metrics above
Weekly Demo Product Owner + Business Weekly Working features + metrics
Exec Summary C-Level / Sponsor Bi-weekly 1-page: milestones, risks, decisions needed
AI ROI Report Sponsor Monthly AI cost vs productivity gain, quality comparison
Cycle Report All stakeholders Every 6 weeks Full cycle review: delivered, deferred, learnings
Post-Mortem Team + relevant stakeholders Per P1/P2 incident RCA, prevention, action items

9. Continuous Improvement

9.1 Retrospective Framework (Cycle-end)

Format: Start / Stop / Continue (30 min)
        + AI-specific section (15 min)
        + Action items (15 min) = 1 hour total

Questions:
  START:
    • What should we start doing?
    • What new AI tools/prompts should we try?
    • What process is missing?

  STOP:
    • What's wasting our time?
    • Which AI patterns aren't working?
    • What ceremonies are useless?

  CONTINUE:
    • What's working well?
    • Which AI workflows are highest ROI?
    • What should we NOT change?

  AI-SPECIFIC:
    • Where did AI help most this cycle?
    • Where did AI cause rework? (hallucination tracking)
    • Is our 2x multiplier holding? Actual measurement?
    • Any prompt library updates needed?
    • AI governance: any near-misses?

Action Items:
  • Max 3 per retro
  • Each has owner + deadline
  • Tracked on board
  • Reviewed next retro

9.2 Learning Budget

Per engineer, per cycle (6 weeks):
  • 4 hours: intentional learning (new tool, new pattern, conference talk)
  • 2 hours: AI experimentation (try new model, new workflow)
  • Cooldown week (Week 6): focus on tech debt, learning, experimentation

Investment: ~6 hours per cycle per person = 1.5% of working time
Return: Keeps team sharp, prevents burnout, discovers better approaches

Related Documents