Project Management
Project Management, Stakeholder Management & Process
Project management, stakeholder management, and engineering process strategy
For a team of 5 engineers, 9 months, legacy .NET → .NET 8 microservices
AI-first approach
1. Project Management Framework
1.1 Why Not Pure Scrum — Hybrid Approach
| Framework | Fit? | Reasoning |
|---|---|---|
| Pure Scrum | ❌ | 5 people don’t need heavy ceremonies. 4-hour sprint planning for 5 people = waste |
| Pure Kanban | ⚠️ | Good for flow, but lacks checkpoints for migration milestones |
| Shape Up (modified) | ✅ | 6-week cycles + cooldown. Fit cho migration phases. Appetite-based (fixed time, variable scope) |
| SAFe | ❌ | Overkill for 5 people. SAFe is for 50+ engineers |
Decision: Shape Up modified + Kanban within cycles
┌──────────────────────────────────────────────────────────────┐
│ PROJECT RHYTHM │
│ │
│ 9 months = 6 cycles × 6 weeks │
│ (aligned with Planning.md phasing) │
│ │
│ Cycle 0 (W1-6): Phase 0 (AI Foundation + Infra + IaC) │
│ Cycle 1 (W7-12): Phase 1a (Travel Booking extraction) │
│ Cycle 2 (W13-18): Phase 1b (Travel go-live + Event start) │
│ Cycle 3 (W19-24): Phase 2a (Event go-live + Workforce) │
│ Cycle 4 (W25-30): Phase 2b (Comms + Reporting go-live) │
│ Cycle 5 (W31-36): Phase 3 (Stabilize + Harden) │
│ │
│ Each cycle: │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Week 1: Shaping (Tech Lead + team define work)│ │
│ │ Week 2-5: Building (Kanban flow, daily standups)│ │
│ │ Week 6: Cooldown (retro, tech debt, learning) │ │
│ └──────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────┘
1.2 Ceremonies (Lean — Respect 5 People Team)
| Ceremony | Frequency | Duration | Purpose | Who |
|---|---|---|---|---|
| Daily Standup | Daily | 10 min | Blockers only. No status reporting — use the board | All 5 |
| Cycle Shaping | Every 6 weeks | 2 hours | Define appetite, scope bets, assign pitches | Tech Lead + team |
| Weekly Demo | Weekly | 30 min | Show working software to stakeholders | Rotating presenter |
| Cycle Retro | Every 6 weeks | 1 hour | What worked, what didn't, AI effectiveness review | All 5 |
| Architecture Review | Bi-weekly | 1 hour | Review ADRs, service boundaries, tech decisions | Tech Lead + senior eng |
| AI Workflow Check | Weekly | 15 min | AI metrics review, prompt calibration, governance check | Tech Lead |
Total ceremony time: ~3.5 hours/week — under 10% of working time. The rest = build.
1.3 Task Management
Tool: Linear / GitHub Projects (kanban board)
Board Columns:
┌──────────┬───────────┬───────────┬──────────┬──────────┬──────┐
│ Backlog │ Shaped │ Building │ Review │ Staging │ Done │
│ │ (ready) │ (WIP ≤5) │ (PR) │ (testing)│ │
│ │ │ │ │ │ │
│ Unshaped │ Scoped, │ In active │ CodeRabbit│ QA on │ In │
│ ideas │ estimated,│ development│ + human │ staging │ prod │
│ │ assigned │ │ review │ env │ │
└──────────┴───────────┴───────────┴──────────┴──────────┴──────┘
WIP Limit: 5 (1 per engineer max). No multitasking.
If blocked → swarm (help each other unblock).
Labels:
🏗️ migration — module extraction work
🤖 ai-generated — majority AI-generated code
🔧 infra — CI/CD, IaC, DevOps
🧪 testing — test creation, contract tests
📊 reporting — reporting/analytics specific
💳 payment-acl — anything touching payment
🔴 blocker — blocking other work
1.4 Risk-Based Milestone Tracking
Instead of tracking by features delivered, track by risks eliminated:
Milestones (Risk Reduction):
M1 (Week 4): "Can we AI-migrate a service?"
✅ Communications service deployed to staging via AI pipeline
Risk eliminated: AI approach proven or disproven
M2 (Week 8): "Can we run dual traffic?"
✅ API Gateway routing live: Travel via new service
✅ Legacy still handling Payment
Risk eliminated: Strangler Fig pattern validated
M3 (Week 14): "Can services talk to each other?"
✅ Travel → Payment ACL working
✅ Event Bus messages flowing
Risk eliminated: Inter-service communication proven
M4 (Week 20): "Can we handle the data?"
✅ CDC from legacy → Reporting working
✅ Per-service DBs for Travel + Event
Risk eliminated: Data migration proven
M5 (Week 28): "Can we scale?"
✅ 5 services running, auto-scaling
✅ Load test passing (simulated 40K users)
Risk eliminated: Production readiness
M6 (Week 36): "Can we operate?"
✅ Monitoring, alerting, runbooks in place
✅ Payment migration plan ready (next phase)
Risk eliminated: Operational readiness
1.5 Estimation — Appetite-Based (Shape Up Style)
No story point estimation. Use appetite — "how much time are we willing to spend on this?"
| Appetite | Duration | Example |
|---|---|---|
| Small Batch | ≤ 1 week | Communications service extraction (AI handles most) |
| Big Batch | 2-4 weeks | Travel Booking full extraction + tests + React pages |
| Epic | 1 cycle (6 weeks) | Event Management + Reporting + related React modules |
Rule: If a task exceeds 6 weeks → must be broken down further. No "ongoing" tasks.
2. Team Structure & Roles
2.1 Team Topology
┌──────────────────────────────────────────────────────────────┐
│ TEAM STRUCTURE (5 engineers) │
│ │
│ ┌──────────────────────────────────────────────┐ │
│ │ Tech Lead (1) │ │
│ │ • Architecture decisions (ADR owner) │ │
│ │ • AI workflow design + governance │ │
│ │ • Stakeholder communication │ │
│ │ • Code review (final gate for critical code) │ │
│ │ • Shaping sessions lead │ │
│ │ • 50% hands-on coding / 50% leadership │ │
│ └──────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────┐ │
│ │ Senior Backend Engineer (1) │ │
│ │ • Service extraction lead │ │
│ │ • .NET 8 migration specialist │ │
│ │ • AI agent power user (Claude Code batch) │ │
│ │ • Database migration + CDC setup │ │
│ │ • Code review (business logic) │ │
│ └──────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────┐ │
│ │ Backend Engineer (1) │ │
│ │ • Service implementation │ │
│ │ • Event handlers + Service Bus integration │ │
│ │ • Contract test authoring (Pact) │ │
│ │ • ACL development + maintenance │ │
│ └──────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────┐ │
│ │ Full-stack Engineer (1) │ │
│ │ • React 18 frontend development │ │
│ │ • Shared design system (Storybook) │ │
│ │ • API integration (frontend ↔ services) │ │
│ │ • AI-assisted UI component generation │ │
│ └──────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────┐ │
│ │ DevOps/Platform Engineer (1) │ │
│ │ • CI/CD pipelines (GitHub Actions) │ │
│ │ • IaC (Bicep) │ │
│ │ • Azure Container Apps management │ │
│ │ • Observability stack (monitoring, alerting) │ │
│ │ • Security scanning integration │ │
│ └──────────────────────────────────────────────┘ │
│ │
│ ───────────────────────────────────────────────── │
│ AI Force Multiplier: Each engineer uses AI (Cursor Pro + │
│ Claude Code). Specialized tasks don’t require specialized │
│ hires. Fullstack eng uses AI for backend when needed. DevOps │
│ eng uses AI for IaC generation. AI fills the "6th engineer". │
└──────────────────────────────────────────────────────────────┘
2.2 Role Rotation Strategy
With 5 people, bus factor is a critical concern. 1 person leaving = 20% capacity lost.
| Strategy | How | Why |
|---|---|---|
| Pair on critical modules | 2 people know each service. No one solely owns a single service | Bus factor ≥ 2 for every module |
| Rotate reviewer | Code review rotates — everyone reviews everyone else’s code | Cross-knowledge |
| AI code walkthrough | Week 1: Senior explains Travel code. Week 2: Backend eng explains Event code... | Everyone understands every service |
| DevOps cross-train | Every backend eng knows how to deploy their own service. No single DevOps dependency | DevOps doesn’t become a bottleneck |
2.3 Collaboration Model
Daily:
09:00 Standup (10 min) — blockers only
09:10 Start of deep work — NO meetings zone until 12:00
14:00 Open for ad-hoc pairing, reviews, discussions
16:00 Async review — PRs, CodeRabbit comments
17:00 Claude Code overnight runs scheduled (batch migration tasks)
Weekly:
Monday AM: Weekly planning (30 min)
Wednesday AM: Architecture review (1 hour, bi-weekly)
Friday PM: Demo to stakeholders (30 min)
Friday PM: AI metrics check (15 min)
Cycle (6 weeks):
Week 1: Shaping — define next cycle's bets
Week 6: Retro + cooldown + tech debt + learning
3. Stakeholder Management
3.1 Stakeholder Map
┌──────────────────────────────────────────────────────────────────────┐
│ STAKEHOLDER MAP │
│ │
│ POWER │
│ ▲ │
│ │ ┌─────────────────────┐ ┌──────────────────────────┐ │
│ HIGH│ │ C-Level / Sponsor │ │ Product Owner / │ │
│ │ │ │ │ Business Stakeholders │ │
│ │ │ Cares about: │ │ │ │
│ │ │ • Timeline │ │ Cares about: │ │
│ │ │ • Budget │ │ • Features working │ │
│ │ │ • Risk │ │ • Zero downtime │ │
│ │ │ • AI ROI proof │ │ • User experience │ │
│ │ │ │ │ • Migration transparency │ │
│ │ │ Strategy: MANAGE │ │ │ │
│ │ │ CLOSELY │ │ Strategy: KEEP │ │
│ │ └─────────────────────┘ │ SATISFIED │ │
│ │ └──────────────────────────┘ │
│ │ ┌─────────────────────┐ ┌──────────────────────────┐ │
│ LOW│ │ External API │ │ End Users (40K) │ │
│ │ │ Consumers │ │ │ │
│ │ │ │ │ Cares about: │ │
│ │ │ Cares about: │ │ • System works │ │
│ │ │ • API compatibility │ │ • No disruption │ │
│ │ │ • Breaking changes │ │ • Performance │ │
│ │ │ • Documentation │ │ │ │
│ │ │ │ │ Strategy: MONITOR │ │
│ │ │ Strategy: KEEP │ │ (communicate through │ │
│ │ │ INFORMED │ │ product channels) │ │
│ │ └─────────────────────┘ └──────────────────────────┘ │
│ │ │
│ └──────────────────────────────────────────────────► │
│ INTEREST │
│ LOW HIGH │
└──────────────────────────────────────────────────────────────────────┘
3.2 Communication Plan
| Stakeholder | Channel | Frequency | Content | Owner |
|---|---|---|---|---|
| C-Level / Sponsor | Executive summary (1-page) | Bi-weekly | Risk status, milestone progress, AI ROI metrics, budget burn | Tech Lead |
| Product Owner | Demo + written update | Weekly | Working features, migration progress, upcoming changes | Tech Lead + rotating eng |
| Business Users (key) | Change notification | Per migration phase | What's changing, what's not, who to contact for issues | Product Owner (with Tech Lead input) |
| External API Consumers | API deprecation notice | 30 days ahead | Breaking changes, migration guides, new endpoints | Tech Lead |
| Engineering Team | Standup + board | Daily | In-progress work, blockers, decisions needed | All |
| Security / Compliance | Audit report | Monthly | SAST results, AI governance compliance, payment module status | DevOps + Tech Lead |
3.3 Stakeholder Communication Templates
Bi-weekly Executive Summary (1-page):
┌──────────────────────────────────────────────────────────────┐
│ EXECUTIVE SUMMARY — Week [X] of 36 │
│ │
│ Overall Status: 🟢 On Track / 🟡 At Risk / 🔴 Blocked │
│ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Milestones │ │
│ │ ✅ M1: AI pipeline validated (Week 4) │ │
│ │ ✅ M2: Dual traffic running (Week 8) │ │
│ │ 🔄 M3: Inter-service communication (in progress) │ │
│ │ ⬜ M4: Data migration validated │ │
│ │ ⬜ M5: Scale test passed │ │
│ │ ⬜ M6: Operational readiness │ │
│ └────────────────────────────────────────────────────────┘ │
│ │
│ Key Metrics: │
│ • Services migrated: 2/5 │
│ • API endpoints migrated: 47/120 (39%) │
│ • Test coverage (new services): 85% │
│ • AI-generated code: 68% (team-reviewed) │
│ • Zero downtime incidents: 0 │
│ │
│ Top Risks: │
│ 1. [Risk] — [Mitigation] — [Status] │
│ │
│ Decisions Needed: │
│ 1. [Decision] — needed by [date] │
│ │
│ Next 2 Weeks: │
│ • [Key deliverable 1] │
│ • [Key deliverable 2] │
└──────────────────────────────────────────────────────────────┘
Weekly Demo Format:
30 minutes max:
5 min: Context (what we aimed to do this week)
15 min: Live demo (working software, not slides)
5 min: Metrics (AI productivity, service health)
5 min: Q&A + next week preview
Rule: If nothing demoable → show monitoring dashboard,
test results, or architecture diagram update.
NEVER skip demo — it builds stakeholder confidence.
3.4 Escalation Path
Issue Severity → Response:
P4 (Low): Engineer fixes → PR → merge
No escalation needed
P3 (Medium): Engineer + Tech Lead discuss
Fix within cycle
Mention in weekly update
P2 (High): Tech Lead decides → immediate fix
Notify Product Owner same day
Include in exec summary
P1 (Critical): Zero downtime violated / Payment affected / Data loss
Tech Lead → Sponsor within 1 hour
War room (all hands)
Hourly updates until resolved
Post-mortem within 48 hours
3.5 Managing Expectations — The "No" Framework
With 5 engineers and 9 months, saying "No" (or "Not now") is the most important skill:
| Request Type | Response Framework |
|---|---|
| "Can we add feature X?" | "Yes, if we defer [Y]. Here's the trade-off." |
| "Can we speed up?" | "We're at 2x AI capacity. Adding people adds coordination cost. We can re-scope instead." |
| "Why isn't Payment modernized?" | "By design. Constraint: Payment frozen Phase 1. Plan exists for Phase 2. Here's the ACL keeping it safe." |
| "Can we skip testing?" | "No. With 75% AI-generated code, testing IS the quality gate. This is non-negotiable." |
| "Competitor launched feature Z" | "Noted. Added to backlog. Current priority: foundation first. Features after migration." |
4. Engineering Process
4.1 Development Lifecycle
┌──────────────────────────────────────────────────────────────────────┐
│ DEVELOPMENT LIFECYCLE (per task) │
│ │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌────────────┐ │
│ │ 1. SHAPE │───►│ 2. BUILD │───►│ 3. REVIEW │───►│ 4. DEPLOY │ │
│ └───────────┘ └───────────┘ └───────────┘ └────────────┘ │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ • Define scope • AI-first dev • CodeRabbit • CI pipeline │
│ • Set appetite • Cursor Agent auto-review • Deploy to │
│ • Identify risks • Write tests • Human review staging │
│ • Write spec (TDD with AI) • Contract test • Smoke test │
│ • Assign pair • Implement pass • Manual │
│ • AI generates • Security scan approval │
│ 70% of code • Prod deploy │
│ (rolling) │
└──────────────────────────────────────────────────────────────────────┘
4.2 Definition of Done (DoD)
✅ DEFINITION OF DONE — Every task must meet ALL criteria:
Code:
□ Feature implemented and builds successfully
□ AI-generated code reviewed by human (mandatory)
□ Follows Clean Architecture structure
□ No TODO/HACK comments left untracked
Testing:
□ Unit tests pass (≥80% coverage for new code)
□ Contract tests pass (Pact — for API changes)
□ Integration tests pass (DB, event bus)
□ No regression in existing tests
Security:
□ SAST scan clean (CodeQL)
□ No secrets in code
□ Payment-related code: 2 human reviewers approved
Observability:
□ Structured logging added for key operations
□ OpenTelemetry trace spans for cross-service calls
□ Health check endpoint working
Documentation:
□ API changes reflected in OpenAPI spec
□ ADR created for architecture decisions
□ README updated if setup/run instructions changed
Deployment:
□ Docker image builds successfully
□ Deployed to staging and tested
□ Monitoring/alerting configured for new endpoints
4.3 Git Workflow
┌──────────────────────────────────────────────────────────────┐
│ GIT WORKFLOW │
│ │
│ main ─────────────────────────────────────────────► │
│ │ │ │ │ │
│ │ merge (squash) merge (squash) merge (squash) │
│ │ ▲ ▲ ▲ │
│ │ │ │ │ │
│ ├── feature/travel-booking-extraction ──┘ │
│ │ │ │
│ │ ┌────┴────────────────────────┐ │
│ │ │ Commits: │ │
│ │ │ feat: scaffold travel svc │ │
│ │ │ feat: migrate booking logic │ │
│ │ │ test: contract tests │ │
│ │ │ fix: edge case in pricing │ │
│ │ └─────────────────────────────┘ │
│ │ │
│ ├── feature/event-management-extraction ──────────────┘ │
│ │ │
│ └── feature/infra-cicd-setup ─────────────────────┘ │
│ │
│ Branch Naming: │
│ feature/{module}-{description} │
│ fix/{module}-{description} │
│ infra/{description} │
│ │
│ Rules: │
│ • PR required for main (no direct push) │
│ • CodeRabbit auto-review on PR create │
│ • ≥1 human approval required │
│ • Payment-related: ≥2 human approvals │
│ • CI must pass (build + test + security scan) │
│ • Squash merge to main (clean history) │
└──────────────────────────────────────────────────────────────┘
4.4 Code Review Process
PR Created
│
▼
┌───────────────────────┐
│ Gate 1: Automated │
│ • CI build + tests │
│ • CodeRabbit review │
│ • SAST scan (CodeQL) │
│ • Contract test check │
└───────────┬───────────┘
│ All pass?
│
┌───────┴───────┐
│ Yes │ No → Fix and re-push
▼ │
┌───────────────────────┐
│ Gate 2: Human Review │
│ │
│ Reviewer focuses on: │
│ • Business logic │
│ correctness │
│ • Edge cases │
│ • Architecture fit │
│ • AI hallucination │
│ detection │
│ • Performance │
│ implications │
│ │
│ NOT focused on: │
│ • Formatting (linter) │
│ • Simple bugs │
│ (CodeRabbit caught) │
│ • Test coverage │
│ (CI enforced) │
└───────────┬───────────┘
│ Approved?
│
┌───────┴───────┐
│ Standard code │ Payment/Security code
▼ ▼
1 approval 2 approvals
│ │
▼ ▼
Merge Merge
4.5 Incident Management
┌──────────────────────────────────────────────────────────────┐
│ INCIDENT RESPONSE FLOW │
│ │
│ Alert Triggered (Azure Monitor / AI Anomaly Detection) │
│ │ │
│ ▼ │
│ ┌──────────────────┐ │
│ │ Triage (5 min) │ │
│ │ Who: On-call eng │ │
│ │ What: Severity? │ │
│ └────────┬─────────┘ │
│ │ │
│ ┌────────┼──────────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ P3/P4 P2 P1 │
│ Low High Critical │
│ │ │ │
│ Fix in Immediate fix War room │
│ next Notify Tech Lead All hands │
│ cycle Same-day update Hourly updates │
│ to PO Sponsor notified │
│ │ │
│ ▼ │
│ ┌──────────────────┐ │
│ │ Post-Mortem │ │
│ │ Within 48 hours │ │
│ │ • Timeline │ │
│ │ • Root cause │ │
│ │ • Impact │ │
│ │ • Prevention │ │
│ │ • Action items │ │
│ └──────────────────┘ │
│ │
│ On-Call Rotation: │
│ Week 1: Tech Lead + Senior BE │
│ Week 2: BE Engineer + Full-stack │
│ Week 3: DevOps + Tech Lead │
│ (Rotate every week. Always 2 people for coverage.) │
└──────────────────────────────────────────────────────────────┘
4.6 Architecture Decision Records (ADR) Process
Trigger: Any decision that affects:
• Service boundaries
• Database choice/structure
• Communication patterns
• Technology selection
• Security model
• AI governance rules
Process:
1. Engineer drafts ADR (AI-assisted — Claude drafts from context)
2. Tech Lead reviews within 24 hours
3. Team review in next Architecture Review meeting (bi-weekly)
4. Accepted / Rejected / Amended
5. Stored in repo: /docs/adrs/ADR-NNN-title.md
ADR Template:
┌──────────────────────────────────────┐
│ # ADR-NNN: [Title] │
│ Status: Proposed / Accepted / Deprecated
│ Date: [date] │
│ Deciders: [names] │
│ │
│ ## Context │
│ What is the problem? │
│ │
│ ## Decision │
│ What did we decide? │
│ │
│ ## Options Considered │
│ | Option | Pros | Cons | │
│ │
│ ## Consequences │
│ What are the trade-offs? │
│ │
│ ## AI Involvement │
│ Was AI used to draft? What was │
│ human-validated? │
└──────────────────────────────────────┘
5. Release Management
5.1 Release Strategy
┌──────────────────────────────────────────────────────────────┐
│ RELEASE STRATEGY │
│ │
│ Type 1: Service Deployment (frequent) │
│ ──────────────────────────────── │
│ • Every merged PR → auto-deploy to staging │
│ • Staging → Production: manual approval (Tech Lead/Senior) │
│ • Rolling update (zero downtime) │
│ • Feature flags for incomplete features │
│ • Frequency: 2-3 times per week │
│ │
│ Type 2: Module Go-Live (per phase) │
│ ──────────────────────────────── │
│ • Full module cutover: traffic routes from legacy → new │
│ • Requires: all contract tests pass + load test pass │
│ • Canary deployment: 5% → 25% → 50% → 100% traffic │
│ • Rollback plan: YARP route back to legacy < 5 minutes │
│ • Stakeholder notified 1 week before │
│ • Frequency: once per phase (roughly monthly) │
│ │
│ Type 3: Database Migration (rare, high-risk) │
│ ──────────────────────────────── │
│ • Per-service DB cutover │
│ • CDC running for weeks before cutover (verify data sync) │
│ • Blue-green: new DB live + old DB as fallback │
│ • Data verification scripts mandatory │
│ • Frequency: 1-2 total across 9 months │
└──────────────────────────────────────────────────────────────┘
5.2 Canary Release Process (Module Go-Live)
Canary Deployment Flow:
Day 1-2: 5% traffic → new service
Monitor: error rates, latency, business metrics
┌─────────────────────────────────────┐
│ 95% ──► Legacy │
│ 5% ──► New Service │
└─────────────────────────────────────┘
Day 3-4: 25% traffic (if Day 1-2 clean)
┌─────────────────────────────────────┐
│ 75% ──► Legacy │
│ 25% ──► New Service │
└─────────────────────────────────────┘
Day 5: 50% traffic
┌─────────────────────────────────────┐
│ 50% ──► Legacy │
│ 50% ──► New Service │
└─────────────────────────────────────┘
Day 6-7: 100% traffic (full cutover)
┌─────────────────────────────────────┐
│ 0% ──► Legacy (standby) │
│100% ──► New Service │
└─────────────────────────────────────┘
Day 14: Legacy module decommissioned
(after 1 week soak at 100%)
At ANY point: error rate > threshold → automatic rollback to legacy
YARP config change = rollback in < 5 minutes
5.3 Feature Flags
Tool: Azure App Configuration (feature management)
or LaunchDarkly (if budget allows)
Usage:
┌──────────────────────────────────────────────────────────────┐
│ Flag Name │ Purpose │
│ ──────────────────────── │ ────────────────────────────── │
│ travel.new-service │ Route traffic to new Travel svc │
│ event.new-service │ Route traffic to new Event svc │
│ react.travel-ui │ Show new React UI for Travel │
│ react.event-ui │ Show new React UI for Events │
│ ai.smart-routing │ Enable AI-based API routing │
│ ai.anomaly-detection │ Enable AI monitoring alerts │
│ reporting.cqrs-mode │ Use CQRS read models vs legacy │
└──────────────────────────────────────────────────────────────┘
Rules:
• All new services behind feature flags
• Flags per-tenant and per-region capable
• Kill switch: disable new service → fallback to legacy instantly
• Flags reviewed and cleaned up every cycle
6. Quality Assurance Process
6.1 Testing Pyramid
┌───────────┐
│ Manual │ ← Exploratory testing only
│ Testing │ (by team on Friday demo)
│ (rare) │
├───────────┤
┌─┤ E2E ├─┐ ← Critical paths only
│ │ Tests │ │ (login → book → pay)
│ │ (few) │ │ Cypress/Playwright
│ ├───────────┤ │
┌─┤ │ Contract │ ├─┐ ← Every service boundary
│ │ │ Tests │ │ │ Pact consumer/provider
│ │ │ (Pact) │ │ │ Run in CI every PR
│ │ ├───────────┤ │ │
┌─┤ │ │Integration│ │ ├─┐ ← DB, event bus, external APIs
│ │ │ │ Tests │ │ │ │ Docker Compose in CI
│ │ │ │ │ │ │ │
│ │ │ ├───────────┤ │ │ │
┌─┤ │ │ │ Unit │ │ │ ├─┐ ← Business logic, domain
│ │ │ │ │ Tests │ │ │ │ │ Fast, no external deps
│ │ │ │ │ (many) │ │ │ │ │ 80%+ coverage target
└─┴─┴─┴─┴───────────┴─┴─┴─┴─┘
AI Role in Testing:
• Unit tests: 80% AI-generated (Claude Sonnet 4)
• Contract tests: 70% AI-generated, human validates contracts
• Integration tests: 50% AI-generated, human sets up fixtures
• E2E tests: 30% AI-assisted (Playwright codegen + AI refinement)
• Manual testing: 0% AI — human exploratory only
6.2 Quality Gates
PR Level:
□ Build passes
□ Unit tests pass (≥80% coverage on changed files)
□ Contract tests pass
□ SAST clean
□ CodeRabbit review: no critical findings
□ Human review: approved
Staging Level:
□ Integration tests pass
□ E2E critical paths pass
□ Performance baseline not degraded (p95 latency)
□ No new security vulnerabilities
Production Level:
□ All staging gates pass
□ Feature flag ready (kill switch)
□ Monitoring/alerting configured
□ Rollback plan documented
□ Tech Lead approval
7. Knowledge Management
7.1 Documentation Strategy
┌──────────────────────────────────────────────────────────────┐
│ DOCUMENTATION HIERARCHY │
│ │
│ /docs/ │
│ ├── adrs/ ← Architecture Decision Records │
│ │ ├── ADR-001-strangler-fig.md │
│ │ ├── ADR-002-yarp-gateway.md │
│ │ └── ... │
│ ├── runbooks/ ← Operational runbooks │
│ │ ├── deploy-service.md │
│ │ ├── rollback-procedure.md │
│ │ ├── incident-response.md │
│ │ └── database-migration.md │
│ ├── api/ ← OpenAPI specs (auto-generated)│
│ │ ├── travel-api.yaml │
│ │ ├── event-api.yaml │
│ │ └── ... │
│ ├── onboarding/ ← New team member guides │
│ │ ├── setup-dev-environment.md │
│ │ ├── ai-workflow-guide.md │
│ │ └── architecture-overview.md │
│ └── migration/ ← Migration-specific docs │
│ ├── legacy-module-inventory.md (AI-generated) │
│ ├── data-migration-plan.md │
│ └── cutover-checklist.md │
│ │
│ Rule: Docs live with code (in repo). │
│ No separate wiki — prevents doc drift. │
│ AI generates first draft, human reviews. │
└──────────────────────────────────────────────────────────────┘
7.2 Onboarding (New Engineer Joins Mid-project)
Day 1:
□ Dev environment setup (AI-assisted — Cursor configured)
□ Read: architecture-overview.md
□ Read: ai-workflow-guide.md
□ Access: all repos, CI/CD, Azure, monitoring dashboards
Day 2-3:
□ Pair with Senior on current task
□ Run full test suite locally
□ Deploy to staging (understand pipeline)
□ Review 3 recent PRs (understand review culture)
Day 4-5:
□ First task: small bug fix or test improvement
□ Full PR flow: code → CodeRabbit → human review → merge
□ First AI-assisted development task (Cursor Agent mode)
Week 2:
□ Own a small feature end-to-end
□ Attend architecture review
□ AI code walkthrough session
Target: Productive contributor by Day 10.
AI tools reduce onboarding time by ~40%
(AI explains codebase, generates boilerplate, catches mistakes early)
8. Metrics & Reporting
8.1 Key Metrics Dashboard
┌──────────────────────────────────────────────────────────────┐
│ PROJECT HEALTH DASHBOARD │
│ │
│ DELIVERY METRICS │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Services migrated: ████░░░░░░ 3/5 (60%) │ │
│ │ API endpoints migrated: ██████░░░░ 72/120 (60%)│ │
│ │ React modules live: ███░░░░░░░ 2/5 (40%) │ │
│ │ Cycle progress: ████████░░ Cycle 4/6 │ │
│ └──────────────────────────────────────────────────┘ │
│ │
│ QUALITY METRICS │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Test coverage (new code): 85% │ │
│ │ Contract test pass rate: 100% │ │
│ │ Production incidents: 0 (P1/P2) │ │
│ │ Zero downtime maintained: ✅ Yes │ │
│ │ CodeRabbit acceptance rate: 92% │ │
│ └──────────────────────────────────────────────────┘ │
│ │
│ AI METRICS │
│ ┌──────────────────────────────────────────────────┐ │
│ │ AI-generated code ratio: 68% │ │
│ │ AI code bug rate vs human: 0.8x (lower!) │ │
│ │ AI PR rejection rate: 12% │ │
│ │ Avg time per service migration: 3.2 weeks │ │
│ │ AI tool cost (monthly): $1,050 │ │
│ │ Effective multiplier (measured): 1.9x │ │
│ └──────────────────────────────────────────────────┘ │
│ │
│ TEAM HEALTH │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Sprint velocity trend: ↗ ↗ → → │ │
│ │ Team satisfaction (retro): 4.2/5 │ │
│ │ Overtime hours this cycle: 2 (acceptable) │ │
│ │ Bus factor (min 2 per service): ✅ Met │ │
│ └──────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────┘
8.2 Reporting Cadence
| Report | Audience | Frequency | Content |
|---|---|---|---|
| Health Dashboard | Team | Real-time (board) | All metrics above |
| Weekly Demo | Product Owner + Business | Weekly | Working features + metrics |
| Exec Summary | C-Level / Sponsor | Bi-weekly | 1-page: milestones, risks, decisions needed |
| AI ROI Report | Sponsor | Monthly | AI cost vs productivity gain, quality comparison |
| Cycle Report | All stakeholders | Every 6 weeks | Full cycle review: delivered, deferred, learnings |
| Post-Mortem | Team + relevant stakeholders | Per P1/P2 incident | RCA, prevention, action items |
9. Continuous Improvement
9.1 Retrospective Framework (Cycle-end)
Format: Start / Stop / Continue (30 min)
+ AI-specific section (15 min)
+ Action items (15 min) = 1 hour total
Questions:
START:
• What should we start doing?
• What new AI tools/prompts should we try?
• What process is missing?
STOP:
• What's wasting our time?
• Which AI patterns aren't working?
• What ceremonies are useless?
CONTINUE:
• What's working well?
• Which AI workflows are highest ROI?
• What should we NOT change?
AI-SPECIFIC:
• Where did AI help most this cycle?
• Where did AI cause rework? (hallucination tracking)
• Is our 2x multiplier holding? Actual measurement?
• Any prompt library updates needed?
• AI governance: any near-misses?
Action Items:
• Max 3 per retro
• Each has owner + deadline
• Tracked on board
• Reviewed next retro
9.2 Learning Budget
Per engineer, per cycle (6 weeks):
• 4 hours: intentional learning (new tool, new pattern, conference talk)
• 2 hours: AI experimentation (try new model, new workflow)
• Cooldown week (Week 6): focus on tech debt, learning, experimentation
Investment: ~6 hours per cycle per person = 1.5% of working time
Return: Keeps team sharp, prevents burnout, discovers better approaches