Documents/analysis/Tech Stack Analysis

Tech Stack Analysis

Target Tech Stack Analysis

Deep-dive into Section 2 of the assignment — why each technology, trade-offs, and what alternatives were rejected


1. Original Assignment Requirements

Layer Technology
Backend .NET 8 Microservices
Frontend React 18 + Shared Design System
Messaging Azure Service Bus (Event-driven)
Database Per-service databases (decomposed from monolith)
Infrastructure CI/CD + IaC
Observability Logging, monitoring, tracing
Data AI-ready data foundation

2. Layer-by-Layer Analysis — Why We Chose It, What We Didn't

2.1 Backend — .NET 8 Microservices

Why .NET 8?

Factor Analysis
LTS .NET 8 is Long-Term Support (through Nov 2026). Production-ready, enterprise-grade
Performance Top-tier benchmark (TechEmpower), native async/await, minimal API for lightweight services
Legacy continuity Legacy is .NET Framework → .NET 8 is the evolutionary path. Team skill transfer is 5-10x higher than Java/Go
Ecosystem Entity Framework Core, MediatR (CQRS), MassTransit (messaging), YARP (reverse proxy) — all mature
Container-native .NET 8 supports dotnet publish --os linux → tiny container image (~80MB). Chiseled Alpine support
AI tooling Copilot/Cursor have excellent .NET support. Semantic Kernel for native AI integration

Rejected alternatives and why:

Alternative Why not?
Java/Spring Boot Current team is .NET. 9 months is not enough to re-skill + migrate
Go Better raw performance but weaker ecosystem for enterprise domain modeling. Lacks a strong ORM
Node.js/TypeScript Fullstack JS appeal but runtime is not stable for enterprise workloads, single-thread limitation
.NET 6/7 Not LTS (.NET 7 is already end-of-life). .NET 8 is the only sensible choice
.NET 9 Too new, STS (Standard Term Support), not yet battle-tested for enterprise

Hidden signal: The assignment specifies ".NET 8" — not ".NET" generically → the assessor wants to see that you know why 8 specifically, not just the latest version.

Internal patterns we adopt:

┌──────────────────────────────────────────────────┐
│              Per-Service Architecture             │
│                                                   │
│  API Layer (Minimal API or Controllers)           │
│       │                                           │
│  Application Layer (MediatR CQRS handlers)        │
│       │                                           │
│  Domain Layer (Entities, Value Objects, Events)    │
│       │                                           │
│  Infrastructure Layer (EF Core, Service Bus)      │
│                                                   │
│  Pattern: Clean Architecture per service          │
│  Why: Testable, domain-centric, portable          │
└──────────────────────────────────────────────────┘

2.2 Frontend — React 18 + Shared Design System

Why React 18?

Factor Analysis
Concurrent rendering React 18 Suspense + transitions = smooth UX during data loading. Critical for travel search (many async calls)
Ecosystem Largest ecosystem: Redux/Zustand, React Query, testing (RTL), component libraries
Hiring React talent pool in Vietnam is very large. Easy to scale team if needed
Legacy transition If legacy uses Razor/MVC → React can be embedded page-by-page (micro-frontend approach)
AI tooling v0.dev, Cursor → excellent React component generation. Highest AI multiplier on the frontend

"Shared Design System" — Signal:

"Shared Design System" = NOT each team designing their own UI

Implications:
  • Storybook or equivalent for component library
  • Centralized design tokens (colors, spacing, typography)
  • Consistency across modules (Travel, Event, Workforce share the same look & feel)
  • Phase 0 investment: set up component library before building features

Trade-off: Upfront setup time (2-3 days) vs. long-term consistent UX
→ Worth it when 6+ modules share UI

Rejected alternatives:

Alternative Why not?
Angular Enterprise-grade but steep learning curve, verbose. Less AI generation support than React
Vue 3 Lighter but smaller ecosystem. Vietnam talent pool: React > Vue
Blazor (.NET) Same stack but immature ecosystem, small community. SSR performance concerns
HTMX/Server-side Simple but cannot deliver the rich interactivity required for travel booking (search, filters, realtime)

Micro-frontend strategy:

Phase 1-2: React app per module, shared via Module Federation
Phase 3: Consolidate shared layout, route-level splitting

Or simpler (recommended for 5 engineers):
  • Single React SPA + route-based code splitting
  • Shared component library package (npm workspace)
  • DO NOT build complex micro-frontends — 5 engineers is too few

2.3 Messaging — Azure Service Bus (Event-driven)

Why Azure Service Bus?

Factor Analysis
Managed Fully managed → zero ops burden for 5 engineers. No Kafka cluster management
Enterprise features Dead-letter queue, sessions, transactions, scheduled delivery, duplicate detection
Integration First-class .NET SDK (MassTransit wraps ASB natively). Tight Azure ecosystem integration
Cost Standard tier is sufficient for 40K users. Premium only needed for extreme throughput
Compliance Azure = enterprise compliance (SOC 2, ISO 27001). Critical for the Payment domain

"Event-driven" — Important signal:

Event-driven ≠ Simply using a message queue

Event-driven architecture means:
  ┌────────────────────────────────────────────────────────────┐
  │ 1. Domain Events                                          │
  │    BookingCreated, PaymentProcessed, StaffAllocated        │
  │    → Service publishes events when state changes           │
  │                                                            │
  │ 2. Integration Events (cross-service)                      │
  │    BookingCreated (Travel) → trigger StaffAllocation (WFM) │
  │    → Loose coupling, eventual consistency                  │
  │                                                            │
  │ 3. Event Sourcing (optional, per aggregate)                │
  │    Store events as source of truth, rebuild state from log │
  │    → Audit trail for free, temporal queries                │
  │                                                            │
  │ 4. CQRS                                                    │
  │    Commands → write model (normalized)                     │
  │    Queries → read model (denormalized, fast)               │
  │    → Separate scaling, optimized per use case              │
  └────────────────────────────────────────────────────────────┘

Rejected alternatives:

Alternative Why not?
Apache Kafka Far more powerful for streaming but massive ops overhead. 5 engineers cannot manage a Kafka cluster. Use Event Hubs if Kafka-like semantics are needed
RabbitMQ Mature but self-hosted = ops burden. CloudAMQP is more expensive than ASB for equivalent features
AWS SQS/SNS Locks into AWS. The assignment context is Azure
gRPC streaming Point-to-point, not pub/sub. Use for sync calls — it does not replace an event bus
Dapr Good abstraction layer but adds complexity. Not needed at this scale

Most important trade-off:

Eventual Consistency vs. Strong Consistency

Event-driven = data will eventually be consistent, but there is lag.
  
Example: User books travel → Travel Service publishes BookingCreated 
         → Payment Service processes → 100ms-5s delay

Mitigation:
  • UI: Optimistic update + loading states
  • API: Return 202 Accepted, poll for status
  • Saga pattern: for multi-step workflows (booking → payment → confirmation)
  • Idempotency: each event has a unique ID, consumers check for duplicates

2.4 Database — Per-service databases (decomposed from monolith)

Why per-service databases?

Factor Analysis
Autonomy Each service owns its own data. Deploy/scale/migrate independently
Schema freedom Travel uses complex relational, Comms could use a document store, Reporting uses columnar
Failure isolation A DB outage only affects 1 service, no cascade
Scaling Scale hot service DBs independently (Travel search needs read replicas, Payment does NOT)

This is the BIGGEST and HARDEST change:

BEFORE (Monolith):
┌──────────────────────────────────────────┐
│           Shared SQL Server              │
│  ┌────────┐ ┌────────┐ ┌──────────┐     │
│  │Travel  │ │Event   │ │Payment   │     │
│  │Tables  │ │Tables  │ │Tables    │     │
│  └───┬────┘ └───┬────┘ └─────┬────┘     │
│      │   JOINs across tables  │          │
│      └──────────┴─────────────┘          │
└──────────────────────────────────────────┘

AFTER (Microservices):
┌─────────┐  ┌─────────┐  ┌─────────┐
│Travel DB│  │Event DB │  │Payment  │    (Payment = monolith DB Phase 1)
│ (Azure  │  │ (Azure  │  │ DB      │
│  SQL)   │  │  SQL)   │  │ (legacy)│
└────┬────┘  └────┬────┘  └────┬────┘
     │            │            │
  No JOINs across databases — only via APIs or events

Data decomposition challenges:

Challenge Impact Solution
Cross-domain JOINs Current queries JOIN Travel + Payment → will break CQRS read model: denormalize where needed
Referential integrity FK across services → cannot exist Eventual consistency via events, saga for multi-service transactions
Data migration Splitting 1 DB → N DBs is a complex operation CDC (Change Data Capture) sync during the transition period
Reporting Reports SELECT from multiple tables → will break Dedicated Reporting DB (materialized views from events)
Legacy coexistence Monolith still uses shared DB → dual write? Strangler Fig: new service owns the table, ACL proxies for monolith

Recommended database type per service:

Service DB Type Rationale
Travel Booking Azure SQL Complex relations: bookings, itineraries, suppliers, pricing
Event Management Azure SQL Relational: events, venues, attendees, schedules
Workforce Azure SQL Staff, skills, allocations — relational fits
Communications Azure SQL / Cosmos DB Messages are more document-like, but SQL is sufficient for simplicity
Reporting Azure SQL (read-only) Materialized views, aggregated from events. Read-optimized
Payment Legacy DB (frozen) ACL wraps access. Phase 1 = DO NOT TOUCH

Decision: Use Azure SQL for everything (Phase 1-2). Polyglot persistence only when there is a clear need (Phase 3+). 5 engineers should not manage multiple DB technologies simultaneously.

2.5 Infrastructure — CI/CD + IaC

The assignment is intentionally vague → the assessor wants you to choose and justify:

Component Choice Rationale
IaC Bicep Azure-native, type-safe, zero learning curve for an Azure team. Terraform is overkill when 100% Azure
CI/CD GitHub Actions Tightly integrated with source code, marketplace actions, matrix builds. Azure DevOps is the alternative if the company already uses it
Container runtime Azure Container Apps Serverless containers, auto-scale, built-in Dapr/KEDA. Kubernetes (AKS) is overkill for 5 engineers
Registry Azure Container Registry Native, cheap, geo-replication, vulnerability scanning
Secrets Azure Key Vault Mandatory for production. Managed identities, zero secrets in code

Why NOT Kubernetes (AKS)?

5 engineers + AKS = TEAM SUFFERING

AKS requires:
  • YAML hell (deployments, services, ingress, configmaps, secrets)
  • Cluster upgrades, node pool management
  • Networking (CNI, pod CIDR, service mesh?)
  • RBAC, pod security policies
  • Monitoring (Prometheus, Grafana stack)
  
Azure Container Apps provides:
  • Deploy with `az containerapp up` or Bicep
  • Built-in scaling rules (HTTP, queue-based)
  • Built-in ingress, TLS, custom domains
  • Optional Dapr integration
  • When AKS is needed → easy migration (Container Apps runs on AKS internally)

Trade-off: Less control vs. massive ops reduction
→ Right choice for this team size. Scale to AKS when team exceeds 15 engineers.

2.6 Observability — Logging, monitoring, tracing

The assignment says 3 words → you must expand it into a full stack:

The Three Pillars of Observability:

┌───────────────────────────────────────────────────┐
│                                                    │
│  LOGS (Structured)    METRICS           TRACES     │
│  ────────────────     ──────────        ────────   │
│  Serilog             OpenTelemetry     OpenTelemetry│
│  → Seq/Azure Log     Auto-instrument   Distributed │
│    Analytics          HTTP, DB, Bus     tracing     │
│                                                    │
│  Correlation ID: each request carries a trace ID   │
│  across all services → debug distributed systems   │
│                                                    │
│  Alerting: Azure Monitor alerts → PagerDuty/Teams  │
│  Dashboards: Grafana or Azure Dashboards           │
│  SLO monitoring: Error budget tracking              │
│                                                    │
└───────────────────────────────────────────────────┘

Why OpenTelemetry?

Factor Analysis
Vendor-neutral Switch backends (Jaeger → Zipkin → Azure Monitor) without changing code
Auto-instrumentation .NET 8 has native OTel support. AddOpenTelemetry() → done
Standard CNCF graduated project. Industry standard from 2024 onward
Future-proof Every cloud vendor supports OTel. No lock-in

2.7 Data — AI-ready data foundation

This is a NEW layer — it does not exist in legacy:

"AI-ready data foundation" ≠ "having a database"

It means:
  ┌──────────────────────────────────────────────────┐
  │ 1. Event Store                                    │
  │    All domain events are persisted → training data│
  │    BookingCreated, PaymentProcessed, StaffShifted  │
  │                                                    │
  │ 2. Data Catalog                                    │
  │    Schema registry for events (Avro/JSON Schema)   │
  │    Team knows what data is available, in what format│
  │                                                    │
  │ 3. Analytics Pipeline                              │
  │    Events → Azure Event Hubs → Stream Analytics    │
  │    → Data Lake / Synapse → Power BI / ML models    │
  │                                                    │
  │ 4. Feature Store (future)                          │
  │    Pre-computed features for ML models              │
  │    Example: user_booking_frequency, avg_spend       │
  │                                                    │
  │ Priority: (1) and (2) in Phase 0-1, (3) in Phase 2│
  │           (4) in Phase 3+                          │
  └──────────────────────────────────────────────────┘

Why does this matter for the assessment? Because the assignment says "AI-first". If the architecture has no path for data to flow into AI models → you miss the "AI-ready" requirement.


3. Tech Stack as a System — How It All Connects

┌─────────────────────────────────────────────────────────────────┐
│                         USERS (40K globally)                     │
│                              │                                   │
│                      Azure Front Door (CDN + WAF)                │
│                              │                                   │
│                    ┌─────────┴──────────┐                        │
│                    │    React 18 SPA     │ ← Shared Design System │
│                    │   (Azure Static     │                        │
│                    │    Web Apps)        │                        │
│                    └─────────┬──────────┘                        │
│                              │ HTTPS                             │
│                    ┌─────────┴──────────┐                        │
│                    │  YARP API Gateway   │ ← .NET 8 reverse proxy│
│                    │  (Azure Container   │   Rate limit, auth,   │
│                    │   Apps)             │   routing              │
│                    └──┬────┬────┬───────┘                        │
│                       │    │    │                                 │
│            ┌──────────┘    │    └──────────┐                     │
│            ▼               ▼               ▼                     │
│     ┌──────────┐   ┌──────────┐   ┌──────────┐                  │
│     │ Travel   │   │ Event    │   │ Workforce│  .NET 8           │
│     │ Service  │   │ Service  │   │ Service  │  Microservices    │
│     │          │   │          │   │          │  (Container Apps) │
│     └────┬─────┘   └────┬─────┘   └────┬─────┘                  │
│          │              │              │                          │
│          ▼              ▼              ▼                          │
│     ┌────────┐   ┌────────┐   ┌────────┐                        │
│     │Azure   │   │Azure   │   │Azure   │  Per-service DBs       │
│     │SQL     │   │SQL     │   │SQL     │                         │
│     └────────┘   └────────┘   └────────┘                        │
│                                                                  │
│     ─── Azure Service Bus (Events) ───────────────────           │
│          BookingCreated → EventNotified → StaffAllocated         │
│                                                                  │
│     ─── OpenTelemetry → Azure Monitor ────────────────           │
│          Logs + Metrics + Traces (correlated)                    │
│                                                                  │
│     ─── Bicep (IaC) → GitHub Actions (CI/CD) ────────           │
│          Infrastructure as Code, automated deployments           │
│                                                                  │
│     ╔══════════════════════════════════════════════╗              │
│     ║  ACL (Anti-Corruption Layer)                 ║              │
│     ║  Legacy Monolith ← wrapped, NOT migrated     ║              │
│     ║  Payment stays here Phase 1                  ║              │
│     ╚══════════════════════════════════════════════╝              │
└─────────────────────────────────────────────────────────────────┘

4. Our Additional Recommendations (Not in the Assignment)

Addition Rationale Layer
YARP .NET-native API Gateway, Strangler Fig routing Infrastructure
MassTransit Abstraction over Azure Service Bus, saga support Messaging
MediatR In-process CQRS, clean handler pattern Backend
Serilog + Seq Structured logging, queryable logs in local dev Observability
Pact Contract testing for service boundaries Testing
Azure Container Apps Serverless containers instead of AKS Infrastructure
Azure Front Door Global CDN + WAF for 40K global users Infrastructure

5. Assessor Perspective

✅ WHAT THEY WANT TO SEE:
  • Justify EVERY technology choice (not just list them)
  • Compare rejected alternatives (show you know why you did NOT choose them)
  • Understand trade-offs (eventual consistency, per-service DB complexity)
  • Add tools the assignment doesn't mention but are necessary (YARP, MassTransit, Pact)
  • AI-ready data foundation = architecture designed for future AI, not an afterthought

❌ WHAT THEY DO NOT WANT TO SEE:
  • Copy the tech stack table and say "these are good choices"
  • No justification — just a list
  • Choosing Kubernetes with a 5-person team
  • Ignoring the data decomposition challenge
  • No mention of AI-ready data