Chapter 22 — Week 4: The 2026 low-code stack

Welcome to Week 4. The MVP scoping document from Week 3 is your contract; this week you turn it into running software. By Friday you will have a deployed prototype with end-to-end auth, database, foundation-model integration, and at least one vertical slice of the wedge feature working from user input through model inference to persisted output. The single most-common Week-4 mistake is to build layers (auth done; database done; model integration done) without ever connecting them. By Friday a real user — a team member counts as a real user — should be able to log in, do the wedge action, and see a result. This chapter exists primarily to make sure that happens.

Chapter overview

This chapter follows the same six-part structure. §22.1 (Concept) sets out the four-layer 2026 low-code AI stack, the build-vs-buy boundary applied to actual tools, where R and Python fit (and where they do not), the hidden costs of low-code, and the team workflow for a 4–5 person cross-campus build. §22.2 (Method) is the day-by-day Week 4 sprint: environment setup, app-shell-and-auth, data layer, foundation-model integration, the first vertical slice, and the Friday demo. §22.3 (Lessons from the cases) pulls eight specific lessons on stack and tool selection from Parts I–III. §22.4 (Tools and templates) is the most operationally dense section in this book — the current 2026 reference for app builders, hosting, databases, auth, foundation-model APIs, vector stores, agent frameworks, observability, and payments, with KL- and Melbourne-specific notes. §22.5 (Worked example) continues Team Aroma through their actual Week 4 build, including the rate-limit incident that delays a teammate by a day and the cross-campus integration meeting they almost botched. §22.6 (Course exercises and deliverables) specifies the Week 4 submission with grading rubric.

How to read this chapter. Read §22.1 in full at Monday morning’s standup. Read §22.4 (the stack reference) before you make any tool-procurement decisions; the wrong tool selected on Monday compounds into days of lost work by Wednesday. Use §22.2 as the day-by-day plan; assign owners per step. Treat §22.3 as Wednesday-evening reading after the foundations are in place. Submit against §22.6 by Friday 23:59.

22.1 Concept

22.1.1 The four-layer 2026 low-code AI stack

A modern AI MVP is best understood as four layers, each with characteristic tooling. The architectural separation matters because each layer has its own commoditisation curve, its own cost structure, and its own switching cost — and because the Iansiti-Lakhani thesis (Chapter 3) tells us the locus of competitive advantage is across layers, not within any one of them.

+----------------------------------------------------------+
|  Layer 4 — App shell and UX                              |
|  (Lovable, v0, Bolt, Replit Agent, Cursor, Windsurf)     |
+----------------------------------------------------------+
|  Layer 3 — Intelligence                                  |
|  (Anthropic, OpenAI, Google, DeepSeek; Vercel AI SDK,    |
|   LangChain/LangGraph, Mastra, Pydantic AI, MCP servers) |
+----------------------------------------------------------+
|  Layer 2 — Data                                          |
|  (Supabase, Neon, Convex, Firebase; pgvector, Pinecone,  |
|   Turbopuffer, Weaviate)                                 |
+----------------------------------------------------------+
|  Layer 1 — Integration and infrastructure                |
|  (Vercel, Railway, Render, Cloudflare; Clerk auth;       |
|   PostHog analytics; Stripe / iPay88 payments)           |
+----------------------------------------------------------+

Layer 1 — Integration and infrastructure is the substrate: hosting, auth, observability, billing. Almost entirely commoditised in 2026; the practical differences across vendors are developer experience and pricing curves, not capability. Default choice for student teams: Vercel + Clerk + PostHog free tier + Stripe for international or Billplz/iPay88 for Malaysia-only payments. Setup effort ≤1 day; ongoing operational cost AUD/MYR ~0 in pilot phase.

Layer 2 — Data is the persistence and retrieval layer: relational data, document storage, vector embeddings for RAG, real-time sync if needed. Commoditised at the storage level; the moats here are at the schema and retrieval pipeline level, not at the database level. Default choice: Supabase (Postgres + pgvector + auth + storage + realtime + edge functions in one platform). Setup effort ≤1 day for a typical student MVP schema.

Layer 3 — Intelligence is the foundation-model and orchestration layer. The DeepSeek shock (Chapter 5) confirmed that frontier capability is commoditising; the moats here are at the prompt-engineering, evaluation, and workflow-integration levels rather than at the model level. Default choice for a B2B/education MVP in the KL or Melbourne context: Anthropic Claude Sonnet via the Anthropic API, called through the Vercel AI SDK if you are on the JavaScript/TypeScript stack, or the official Anthropic Python SDK if you have R/Python components. Set up the model client behind a thin abstraction so swap to GPT, Gemini, or DeepSeek is a one-line change.

Layer 4 — App shell and UX is the customer-facing surface: pages, forms, dashboards, mobile-responsive layouts, brand. The 2024–2026 wave of AI app builders (Lovable, v0, Bolt, Replit Agent) collapsed this layer’s build cost by an order of magnitude; a working app shell that would have required 3–5 days of front-end engineering in 2022 takes 2–4 hours in 2026. Default choice for student teams: Lovable for the initial scaffold (best at producing complete, deployable Next.js apps from prompts), with Cursor as the day-to-day IDE for refinement. v0 and Bolt are reasonable alternatives; Replit Agent is the strongest choice if your stack has Python components.

The principle that organises the stack is the same principle from Chapter 21: build the wedge, buy or borrow everything else. The four-layer view makes the principle operational. Layers 1, 2, and 4 are almost entirely buy/borrow for a student MVP. Layer 3 is mixed — the model is buy, but the prompts, the RAG pipeline, the evaluation harness, and the workflow integration are build. The build effort in Week 4–6 should concentrate inside Layer 3’s application-specific parts, with everything else assembled from existing tools.

22.1.2 The build-vs-buy boundary, revisited

Chapter 21’s build/buy/borrow framework gave you the principle. This week applies it to actual tool choices, with the caveat that the framework’s outputs are heavily team-dependent. A team with a strong front-end engineer might build the app shell from scratch using Next.js + Tailwind, treating Lovable as too constraining. A team with no front-end engineer should treat Lovable as essential. The framework is not “always buy”; it is “build only the wedge.”

A useful diagnostic: walk through your MoSCoW list from Chapter 21 and ask, for each Must feature, if this feature were a commodity tool we could buy, would we still have a startup? The answers cluster into three buckets:

Yes, definitely — the feature is necessary infrastructure (auth, hosting, payments, basic UI components) but not what makes your product distinct. Buy or borrow.
No — the feature is your product (the wedge AI capability, the workflow integration with the customer’s existing systems, the proprietary data pipeline). Build.
Maybe — the feature is differentiating but the differentiation is small (custom UI components, specific analytics dashboards, branded email templates). Buy or borrow now; revisit at Week 8 if it has become a customer-cited differentiator.

The pattern that emerges, for almost every MVP, is that 2–4 features are firmly Build, 8–12 features are firmly Buy/Borrow, and 2–4 features sit in the Maybe bucket. The 2–4 Build features must absorb roughly 70–80% of the team’s engineering effort over Weeks 4–6. If your time allocation does not match this, you are over-investing in commoditised infrastructure at the expense of the wedge.

22.1.3 Where R and Python fit — and where they do not

The course is configured for low-code preference, with R and Python as exceptions rather than the default. Three classes of work justify the R/Python overhead:

Custom evaluation pipelines. Foundation-model output evaluation at scale — running 1,000 test inputs through the model, scoring against golden answers, computing aggregate metrics — is much easier in Python (with the OpenAI Evals library, the Anthropic evaluation framework, or LangSmith) than in any low-code platform. Chapter 23 develops this in detail. For the Week-4 build, evaluation is typically deferred; for Weeks 5–6, a Python evaluation script is often the right choice.

Custom embedding and RAG pipelines. Where your RAG corpus is large (10,000+ documents), domain-specific (Malaysian commercial law, SPM past-year papers, healthcare procedures), or sensitive (data must not leave a sovereign perimeter), you may need a custom embedding pipeline. Python with sentence-transformers, langchain, or llama-index is the standard stack. R is rarely the right tool for embedding pipelines.

Statistical or econometric analysis. For business-analytics, fintech, and economics-adjacent MVPs (which include several plausible Monash student-team domains), R and Python are first-class tools. The seminr package in R for PLS-SEM, the tidyverse for data manipulation, statsmodels and scipy.stats in Python — none of these have low-code equivalents. If your MVP’s wedge involves statistical modelling beyond what foundation models can reliably do, build the modelling layer in R or Python and expose it to your low-code app shell as an HTTP API (see §22.2.5 for the integration pattern).

Fine-tuning. If your evaluation in Week 5–6 reveals that no foundation model is good enough for your task and you need a fine-tuned model, the fine-tuning workflow lives in Python (typically through Hugging Face’s transformers and peft libraries, or through hosted fine-tuning services like Together AI’s fine-tuning API). For 95% of student MVPs, fine-tuning is not needed; prompt engineering and RAG cover the gap.

The integration pattern when R/Python is needed. Do not try to build the entire app in Python if only one component needs it. Instead: build the app shell and main backend in JavaScript/TypeScript on Vercel, build the R/Python component as a separate microservice (deployed on Modal Labs, Replicate, Hugging Face Inference Endpoints, or Railway), and call the microservice over HTTP from the main backend. The pattern keeps the app build fast while letting you escape into R/Python where it genuinely helps.

A specific note on R for student teams: R is excellent for analysis and reports but underpowered as an application platform. For econometric MVPs, the typical pattern is to build the modelling layer in R (using seminr, BEKKs, vars, quantmod, or similar packages familiar from finance and statistics courses), expose the model behind an plumber API, and integrate from a TypeScript front-end. Resist the temptation to build the front-end in Shiny; for production-grade MVPs targeting external users, Shiny’s UX ceiling will frustrate your team within 2–3 weeks.

22.1.4 The hidden costs of low-code

Low-code platforms have made AI MVPs much faster to build but have introduced new costs that are easy to miss in the Week-4 enthusiasm. Five hidden costs recur:

Vendor lock-in. A Lovable-generated app’s code is yours, but the dependencies, conventions, and component patterns are Lovable’s. Migrating to a different stack later is feasible but takes 1–2 weeks of refactoring. Mitigation: choose builders whose output is standard (e.g., Lovable produces Next.js + Tailwind + shadcn — all transferable); avoid builders whose output is proprietary.

Pricing cliffs. Free tiers are generous at MVP scale and step-function more expensive at growth scale. Vercel’s free tier covers most student MVPs comfortably; the $20/month Pro tier is needed once you exceed 100GB bandwidth; the Enterprise tier becomes relevant only at substantial customer scale. Plan for the cliff before you hit it. Specifically: budget at least USD 50–100/month total infrastructure cost from Week 6 onward (alpha launch); the free tiers run out faster than you think.

Performance ceilings. Low-code platforms produce competent but not optimised code. For 95% of MVPs this is fine; for performance-sensitive applications (real-time multiplayer, sub-second voice latency, very high-volume B2C), low-code outputs hit ceilings that require hand-optimisation. Identify whether your product is in the 95% (text-based interactive workflows, dashboards, content generation, document processing) or the 5% (real-time, voice, video, very high-volume) before committing.

Customisation walls. Low-code platforms encode opinions about how things should be done. When you want to do something the platform’s authors did not anticipate, you hit a wall. Common walls: complex multi-tenancy, regulated data flows (PDPA / GDPR / HIPAA segregation), unusual auth flows (e.g., school-managed accounts where parents and students share access). Mitigation: use the low-code platform for the standard 80% of your app, drop into hand-written code for the customised 20%.

The “production is different” gap. Low-code platforms produce demos that look like production but have not been tested as production. Real production needs error handling, retries, observability, rate limiting, security headers, CORS configuration, and graceful degradation under failure. The platform’s defaults handle most of this competently but not perfectly. Plan for a 2–3-day “production hardening” sprint between alpha (Week 6) and beta (Week 7) to close the gap.

22.1.5 Working in a 4–5 person cross-campus team

The team-workflow question is non-trivial for a 4–5 person team building software for the first time, especially with members across two timezones. The default operating model:

Source control: GitHub, single repo, monorepo or polyrepo by team preference. Single-repo (monorepo) is the default for a 5-person 10-week project — coordination overhead of polyrepo is rarely worth it at this scale.

Branch protection on main. Require pull requests with at least one review before merge. Direct pushes to main are blocked except by the team lead in genuine emergencies. The discipline prevents the most common student-team failure mode: a member force-pushes a broken commit on Friday evening and breaks the team’s environment for the weekend.

Branching convention. main is always deployable. Each feature branch is feat/<short-name> (e.g., feat/teacher-review-ui); each fix is fix/<short-name>; each refactor is refactor/<short-name>. Feature branches live for 1–3 days at most; longer-lived branches accumulate merge-conflict pain.

Pull request discipline. Every PR has: a linked task ticket (from Linear, GitHub Issues, or Notion), a description of what changed, screenshots or screen-recording for UI changes, and at least one peer reviewer. Review turnaround target: <24 hours, with the cross-campus rotation respecting both timezones.

Code review at student-team scale. Reviews are not full audits — they are sanity checks. The reviewer reads the changes, runs the code locally if possible, and asks: does this break anything obvious? Does this match our conventions? Is the test coverage proportional to the risk? A 5-minute review is normal; a 30-minute review is a signal that the PR is too large and should have been split.

Deployment cadence. Vercel auto-deploys preview environments per PR (free); production deploys on merge to main. The team should ship to production several times per day after Week 4 has stabilised — large infrequent deploys produce more bugs than small frequent ones.

Documentation discipline. Every component, API endpoint, and database table is documented in the repo’s README or docs folder as it is built, not retroactively. Cross-campus teams without strong documentation discipline diverge quickly because synchronous communication windows are limited.

Async-first communication. With KL and Melbourne working hours overlapping ~5 hours per day (roughly 9am–2pm KL / 12pm–5pm Melbourne in the standard semester), most communication must be async. Slack, Discord, or Notion is the substrate; team-defined response-time expectations (~24 hours for normal requests, ~4 hours for blocking requests) keep async working.

Two synchronous meetings per week. From Chapter 19’s founder agreement: typically Tuesday and Friday, with the meetings alternating which campus’s working hours are inconvenient. The Tuesday meeting reviews the prior week and plans the current; the Friday meeting reviews progress and prepares the deliverable submission.

The pattern is well-tested: distributed engineering teams have worked this way for 15+ years, and student teams that adopt the pattern outperform student teams that try to be fully synchronous (which typically fails by Week 4 due to timezone fatigue) or fully async (which typically fails by Week 5 due to coordination failure).

22.2 Method — the Week 4 sprint

22.2.1 Day 1 (Monday): environment setup and provisioning

By Monday end-of-day every team member should have working access to:

The shared GitHub organisation and repo (with branch protection enabled).
The shared Vercel team (free tier; Hobby plan is sufficient).
The shared Supabase project (free tier).
The shared Clerk application (free tier).
The shared Anthropic API key, with the paid tier provisioned (~USD 20 minimum to avoid free-tier rate limits).
The shared PostHog account (free tier).
The shared Notion / Google Drive / Linear workspace for documentation and task tracking.
A shared development environment file (.env.example) committed to the repo, with all required environment variables listed (no actual secrets — those live in Vercel/Supabase/Clerk dashboards and in each developer’s local .env.local file).

The provisioning is mechanical but error-prone. Allocate one team member as the infrastructure lead for the day; they own getting everyone provisioned, and have authority to escalate when other team members are blocked on access.

A common Day-1 mistake: each team member uses their personal Vercel/Supabase/Clerk account, with the project owned by one member. When that member’s free tier is hit, the whole team is blocked. Always provision team-level accounts on each platform from the start; the per-platform setup overhead is 15–30 minutes per platform.

22.2.2 Day 1–2: app shell and authentication

By Tuesday end-of-day the team should have a deployed app at a Vercel preview URL with working authentication. Method:

Scaffold via Lovable. Open Lovable; describe the app in 3–5 sentences (the role of each user type, the primary screens, the brand). Lovable produces a Next.js 14+ app with Tailwind, shadcn/ui components, and TypeScript. Iterate the prompt 2–3 times until the scaffold is roughly right.
Export to GitHub. Lovable connects to GitHub directly; commit the scaffold to your main branch.
Deploy to Vercel. Vercel auto-detects the Next.js app; the first deploy completes in 2–5 minutes.
Wire Clerk. Add @clerk/nextjs to dependencies; follow the Clerk Next.js quickstart (15-minute walkthrough). Configure the Clerk dashboard with appropriate user types if you have multiple roles (e.g., teacher, student, centre owner).
Verify the round-trip. Sign up as a test user, log in, see the protected dashboard. If this works, the app shell is complete.

The 5-step sequence is achievable in 4–6 hours by a team without prior Next.js experience. Teams with prior experience complete it in 2–3 hours. Watch for the common failures: Clerk webhook secrets not set in Vercel (auth events fail silently); environment variables missing on the Vercel side after local setup (preview deploys fail); Lovable’s default styling overriding shadcn (cosmetic but disorienting).

22.2.3 Day 2–3: data layer

By Wednesday end-of-day the team should have a Supabase schema deployed, basic CRUD operations working from the app, and seed data loaded for testing. Method:

Schema design. Sketch the schema on a whiteboard or Miro before writing any SQL. Identify the entities (users, teachers, students, centres, questions, attempts, ai_explanations, teacher_reviews — for Team Aroma’s Pulse). Identify the relationships (foreign keys). Identify what is row-level-security gated (almost everything in a B2B app where centres should not see each other’s data).
Create Supabase tables. Use the Supabase dashboard’s SQL editor or the Supabase CLI for version-controlled migrations. Migrations are checked into the repo so the team can recreate the database.
Set up row-level security (RLS) policies. Critical for any multi-tenant app. RLS policies enforce, at the database level, that a user can only read/write their own organisation’s data. Without RLS, a security failure in the application layer becomes a data leak.
Wire pgvector if needed. For RAG-using apps (most education and knowledge-work MVPs), enable pgvector and add the embedding columns to the relevant tables.
Seed test data. A 10–20-row seed dataset that lets the team test workflows end-to-end. Seed data lives in a script committed to the repo so any team member can rebuild their local Supabase to the standard state.

The data layer is the substrate everything else depends on; underinvestment here produces compounding pain through Weeks 5–6. Spend the time to get RLS right.

22.2.4 Day 3–4: foundation-model integration

By Thursday end-of-day the foundation model is integrated, the first prompt template is in production, and the team can generate AI output through the deployed app. Method:

Set up the API client. Use the Vercel AI SDK if you are on the JavaScript stack (it abstracts over Anthropic, OpenAI, Google, and others uniformly), or the official Anthropic SDK directly if you want tighter control. Both are 5-line setups.
Define the first prompt template. A typed wrapper around the model call: input parameters, output schema, system prompt, user prompt template. Keep the prompt in version control (a .md file in the repo) so changes are reviewed.
Wire to the database. When the model produces output, persist it to the appropriate table along with the input, the prompt version, the model used, and timestamps. This is your Week-5 evaluation infrastructure under construction; the discipline of persisting every input-output pair is what makes systematic improvement possible.
Test the round-trip. A user submits an input; the model produces output; the output is persisted; the user sees the output. End-to-end through every layer.
Set up cost monitoring. Anthropic’s dashboard shows per-day spending; set up a budget alert for ~10× your expected daily spend. The alert catches runaway loops or accidentally infinite recursion early. PostHog or Sentry can also be configured to log every model call for debugging.

A specific prompt-engineering note: keep the system prompt long-lived and the user prompt task-specific. The system prompt encodes the model’s role, constraints, and output format; the user prompt is the per-task content. This separation makes prompt iteration much faster — you can tune the user prompt for one task without disturbing other tasks.

22.2.5 Day 4–5: the first vertical slice

By Friday morning the team should have a complete vertical slice of the wedge feature working from user input through the full stack to persisted output. The vertical slice is the single most important Week-4 outcome.

A vertical slice is a thin end-to-end implementation of one feature that touches every architectural layer — UI, auth, business logic, model inference, persistence, retrieval, output rendering. It is not a horizontal layer (e.g., “all the database tables built but no UI”). It is a single feature done end-to-end at the lowest possible quality bar, with everything else stubbed.

For Team Aroma’s Pulse: the vertical slice is one teacher logs in, sees one student, generates one practice question with AI explanation, reviews the explanation, sends it to the student, and the student sees it. The slice deliberately uses one teacher, one student, one question, with most surrounding features stubbed. Once the slice works end-to-end, Week 5 turns the slice into the production feature by widening: many teachers, many students, many questions, with the surrounding features filled in.

The vertical-slice discipline matters because it forces you to confront integration problems early. A team that builds horizontal layers (database first; then API; then UI; then model integration) typically finds that the layers do not connect cleanly when assembled in Week 6 — and by then, with two weeks until alpha, the integration debugging consumes the build budget. The vertical-slice approach assembles the layers from Day 1 of the build week and irons out the integration issues incrementally.

The R/Python integration pattern, if your wedge requires it: build the R/Python service first as a standalone HTTP endpoint (deployed on Modal, Replicate, or Railway), with one input/output round-trip working. Then call it from your Next.js backend exactly as you would call the Anthropic API. The vertical slice includes the R/Python service in the loop, even if its logic is stubbed for the first slice.

22.2.6 The “Frankendemo” antipattern

A common Week-4 failure mode is the Frankendemo: a collection of disconnected demos that, individually, look impressive, but do not assemble into an end-to-end working product.

Symptoms of the Frankendemo:

The auth works (you can sign up and log in) but does not connect to the rest of the app — once logged in, you see a static landing page.
The database has tables and seed data but no UI reads or writes from it.
The model integration produces output via a debug endpoint but no user flow ever calls the endpoint.
The team’s Friday demo is a series of separate browser tabs and terminal windows showing each layer independently.

The Frankendemo is the predictable result of horizontal-layer building (§22.2.5). It is also a comforting failure mode: each layer “works” in isolation, so each team member can claim their piece is done. The team only discovers that the pieces do not connect when they try to give a unified demo, which often does not happen until Week 6’s alpha-launch panic.

The vertical-slice discipline is the antidote. By Friday of Week 4 the team should be able to give a single demo, in a single browser tab, of one user doing one workflow end-to-end. If the team cannot, the build is ahead of the integration; pull back and integrate before adding features.

22.2.7 Friday evening: prototype demo and Week 5 plan

The Friday demo is the team’s internal milestone, distinct from the deliverable submission. Its purpose is honest assessment: where are the integration gaps, what was harder than expected, what was easier, and how does Week 5’s plan adjust?

The format:

One member (rotating) drives the demo — clicks through the vertical slice from start to finish, while the rest of the team watches.
Two minutes of celebrating what works. Genuinely. The first vertical slice is a real accomplishment.
Ten minutes of identifying gaps. What broke during the demo, what feels rough, what is faked vs working, what production-grade hardening is missing?
Twenty minutes of Week 5 planning. Convert the gaps to specific tickets, assign owners, set checkpoint dates.

The Week 5 plan should produce, by next Friday, a full-quality version of the vertical slice (the wedge feature end-to-end at production quality) and the secondary features from MoSCoW Must list. Week 6 then becomes alpha launch with the friendly cohort booked from Week 2.

22.3 Lessons from the cases

Eight specific lessons on stack and tool selection from Parts I–III shape Week 4 decisions.

22.3.1 Cursor — the IDE as competitive advantage (Chapter 5)

Anysphere’s bet that a better IDE-around-the-model could outcompete GitHub Copilot has been substantially vindicated; Cursor’s ARR exceeded USD 1B by late 2025. The bet’s mechanism: the IDE is the workflow integration that Copilot did not invest in. Multi-file context, repo-level awareness, the diff approval flow, the cmd-K inline edit pattern — these are not the model; they are the surface that determines how the model is used.

Operational implication. Your Layer-4 surface (the app shell, the user interface, the workflow design) is where most of your competitive advantage compounds, even when the underlying foundation model is commoditised. Spend disproportionate Week-4 effort on the workflow design, not on the model integration. The model is the API call; the workflow is the product.

22.3.2 Anysphere uses Cursor — “eat your own dogfood” (Chapter 5)

Anysphere’s own engineers build Cursor using Cursor. The dogfooding is not just marketing — it is the primary mechanism by which the team identifies friction in their own product. Every issue that frustrates an Anysphere engineer is a Cursor design defect, identified in real time, with the engineer’s knowledge of the codebase as a debugging asset.

Operational implication. Use your own product. The pre-validation calls in Week 1 captured customer voices; the alpha in Week 6 captures friendly-user voices; in Week 4–5, the team itself is the most accessible user. If a team member would not use the product as it stands, no external user will. The discipline of “would I use this?” should be applied at every design decision through the build.

22.3.3 JPMorgan COiN — internal vs external tooling distinction (Chapter 6)

COiN is internal-facing; the bank’s legal team uses it. The architectural decisions (deep workflow integration with case-management systems; audit trails; role-based access controls) reflect that fact. An external-facing tool (e.g., a contract-review SaaS for law firms) would have different architecture: multi-tenancy, customer-managed data isolation, customer-controlled audit configuration, public-facing security certifications.

Operational implication. Decide explicitly whether your MVP is internal-facing (used inside one customer’s organisation), external-facing-single-tenant (each customer has their own deployment), or external-facing-multi-tenant (one shared deployment serving many customers). The architecture differs substantially across the three. For most student MVPs the answer is external-facing-multi-tenant with strong row-level security, which is what Supabase’s RLS is designed for.

22.3.4 Anthropic and MCP — building agents requires standard tooling (Chapter 13, forthcoming)

Anthropic’s Model Context Protocol (MCP), open-sourced in November 2024, has become the de-facto standard for connecting AI models to data sources and tools. The protocol exists because every agentic system needs the same three things: a way to expose data sources, a way to expose actions, and a way to compose them. Without a standard, every team rebuilds the same plumbing.

Operational implication. If your MVP involves agentic behaviour (the AI takes multi-step action on behalf of the user), use MCP from the start. Anthropic, OpenAI, and increasingly Google support MCP-compatible servers. The Vercel AI SDK has MCP integration built in. Building your own agent-orchestration layer is a Week-7+ optimisation; for Week 4–5 use MCP-native frameworks (LangGraph, Pydantic AI, Mastra) and let the framework handle the agent loop.

22.3.5 Stitch Fix — data infrastructure as the hidden product (Chapter 8, forthcoming)

Stitch Fix’s competitive advantage at scale was not the recommendation algorithm; it was the data-collection infrastructure that captured 60+ structured signals per customer per shipment. The recommendation system was the visible product; the data pipeline was the moat. By Year 3 the company had a proprietary dataset on style preferences that no competitor could replicate without years of similar collection.

Operational implication. From the first vertical slice, log every input-output pair the system handles — every user action, every model call, every customer correction, every pass/fail outcome. The logging is not feature work; it is moat construction. Most student teams under-invest in logging in Week 4 and regret it in Week 7 when they need to evaluate alpha results and find that the data they need was never captured.

22.3.6 Glean — integration depth as the moat (Chapter 5)

Glean’s product is enterprise search; what makes it valuable is the integration depth across Slack, Jira, GitHub, Confluence, Google Drive, Salesforce, Notion, and ~50 other enterprise tools. A new entrant with a similar AI capability but only 5 integrations cannot match Glean’s results. The integrations are the moat, not the model.

Operational implication. If your MVP plays in a multi-system workflow (typical for B2B), the integrations are your moat. Plan integrations as first-class features, not as adjunct work. For Week 4, get one integration working end-to-end; for Weeks 5–6, expand to 3–5 integrations. The integration strategy is also the customer-conversation strategy: each integration is a reason for a customer to commit (and a reason for a competitor to find your moat hard to bypass).

22.3.7 DBS GANDALF — choosing standardised tooling and the in-source shift (Chapter 4)

DBS’s transformation included a deliberate shift from outsourced systems-integrator work to in-house engineering with standardised tooling (cloud-native, microservices, automated CI/CD). The standardisation was not a constraint; it was an enabler. By committing to a small number of tools done well, the bank’s engineers could move between projects fluidly and the operational overhead of running production fell.

Operational implication. Resist the temptation to use 15 different tools because each is “best in class” for its niche. The integration tax exceeds the per-tool benefit at student-team scale. The default 2026 stack proposed in §22.4 covers 90%+ of student MVPs with 8–10 tools; deviate only when a specific tool is materially better for your specific use case.

22.3.8 The DeepSeek shock — foundation-model commoditisation (Chapter 5)

The DeepSeek-R1 release on 20 January 2025, MIT-licensed at frontier reasoning capability, demonstrated that foundation-model performance is commoditising rapidly. The implication for AI MVPs: do not architect around a single foundation-model provider. By the time your alpha launches in Week 6, the model landscape will have shifted; by the time you raise a seed round (Year 2 of post-graduation), the lock-in to any specific model will look quaint.

Operational implication. Build the foundation-model client behind a thin abstraction. Use the Vercel AI SDK or a similar abstraction that makes model-swap a configuration change. Test 2–3 models on your task in Week 3 and choose the best one for now; revisit the choice in Weeks 6–7 against actual alpha data. Avoid prompts that depend on specific model idiosyncrasies (e.g., GPT-specific output formatting tokens) that would not transfer.

22.4 Tools and templates

This is the most operationally dense section in the book. The recommendations are current to mid-2026; the tool landscape moves quickly, so verify current pricing and feature sets at procurement.

22.4.1 The 2026 low-code stack reference

Layer	Category	Default	Strong alternative	Notes
4	App scaffolding	Lovable	v0 (Vercel), Bolt (StackBlitz), Replit Agent	Lovable best for full-stack Next.js; v0 best for individual components; Bolt fast for prototyping; Replit best for Python stacks
4	IDE	Cursor	Windsurf (Codeium), VS Code + Copilot	Cursor for general; Windsurf strong on agent-driven; VS Code default if team prefers
1	Hosting	Vercel	Railway, Render, Cloudflare Pages	Vercel best for Next.js + AI SDK; Railway for full-stack including DB; Cloudflare for edge-first
1	Auth	Clerk	Supabase Auth, Auth.js	Clerk best DX; Supabase Auth bundled with DB; Auth.js for full control
1	Analytics	PostHog	Plausible, Mixpanel	PostHog for product analytics + session replay; Plausible for simple traffic
1	Payments	Stripe	Paddle, Lemon Squeezy	Stripe universal; Paddle/Lemon for merchant-of-record (simpler global tax); for KL: Billplz, iPay88, eGHL; for AU: Stripe direct or Square
1	Error monitoring	Sentry	LogTail, Highlight	Sentry standard; free tier sufficient for student scale
2	Database	Supabase	Neon, Convex, Firebase	Supabase = Postgres + Auth + Storage + pgvector + Realtime + Edge Functions; comprehensive default
2	Vector store	pgvector (in Supabase)	Turbopuffer, Pinecone, Weaviate	pgvector simplest; Turbopuffer cheapest at scale; Pinecone if you need managed-without-Postgres
2	Object storage	Supabase Storage	Cloudflare R2, AWS S3	Bundled with Supabase; R2 cheaper at scale (no egress fees)
3	Foundation model	Anthropic Claude	OpenAI GPT, Google Gemini, DeepSeek	See §21.4.5 for model-by-model guidance
3	Inference (open-weight)	Together AI	Fireworks, Groq, Modal, Replicate	Together broadest model selection; Groq fastest inference; Modal best Python integration
3	AI SDK / orchestration	Vercel AI SDK	LangGraph, Mastra, Pydantic AI	Vercel AI SDK for TypeScript/JS; LangGraph for complex agents; Pydantic AI for type-safe Python
3	LLM observability	LangSmith	Langfuse (self-hostable), Helicone	LangSmith mature; Langfuse open-source alternative; Helicone simple proxy approach
3	RAG framework	LangChain (JS or Python)	LlamaIndex, custom	LangChain dominant; LlamaIndex strong on document-heavy use cases
3	Evaluation	LangSmith / OpenAI Evals	Promptfoo, Inspect AI	LangSmith for production loops; Promptfoo for CI/CD; Inspect AI for safety-focused evals
3	Agent protocol	MCP (Model Context Protocol)	Custom HTTP, OpenAI tools	MCP de-facto standard since Nov 2024; supported by Anthropic, OpenAI, Vercel AI SDK
—	Source control	GitHub	GitLab, Bitbucket	GitHub default for student teams
—	Issue tracking	Linear	GitHub Issues, Notion	Linear best DX; GitHub Issues sufficient for free; Notion for non-technical mixed teams
—	Docs / wiki	Notion	GitBook, Mintlify	Notion default; Mintlify for public-facing developer docs
—	Sync comms	Slack	Discord, Teams	Slack standard for B2B context; Discord for community-facing

For KL/Melbourne teams, three additional considerations:

Vercel has US, EU, and Asia-Pacific edge regions; latency from Klang Valley to Vercel’s Singapore region is ~10–30ms. Latency from Melbourne to Vercel’s Sydney region is ~5–15ms.
Anthropic has US-based inference by default; Asia-Pacific regional endpoints are available on enterprise plans (not relevant for student scale, but plan for it post-graduation if you continue).
Supabase allows region selection; choose ap-southeast-1 (Singapore) for KL or ap-southeast-2 (Sydney) for Melbourne. Cross-region writes add 100–250ms; for cross-campus teams, picking one region and accepting the cross-campus latency is the standard pattern.

22.4.2 The build / buy / borrow decision tree (refined from Chapter 21)

For each Must-have feature from your MoSCoW, walk through:

Question 1: Is this feature your competitive wedge?
  YES → Build (and skip remaining questions).
  NO  → continue.

Question 2: Does a vendor product cover this feature at <USD 50/month
            for student-MVP scale?
  YES → continue to Question 3.
  NO  → continue to Question 4 (open-source).

Question 3: Is the vendor's lock-in cost (data extraction, integration
            rebuilding) acceptable if we want to migrate in 12 months?
  YES → Buy.
  NO  → continue to Question 4.

Question 4: Is there a mature open-source library or self-hostable
            alternative?
  YES → Borrow (use the open-source).
  NO  → Build (with explicit acknowledgement that you are building
        infrastructure, not the wedge).

A sanity check on your decision: tally up the Build features. If you have 6+ Build features, recur back to Chapter 21’s MoSCoW and demote some to Should/Could. Six Build features is the upper limit for a 5-person 4-week sprint with Week 6 alpha as the deadline.

22.4.3 Repository structure template

For a typical Next.js-based AI MVP, the repo structure:

project-name/
├── README.md                    # The first document any new team member reads
├── .env.example                 # All required env vars listed (no secrets)
├── .gitignore                   # Standard Next.js + node_modules + .env.local
├── package.json
├── tsconfig.json
├── next.config.js
├── tailwind.config.ts
│
├── app/                         # Next.js App Router
│   ├── layout.tsx
│   ├── page.tsx
│   ├── (auth)/                  # Public auth routes
│   ├── (dashboard)/             # Authenticated routes
│   └── api/
│       ├── chat/route.ts        # AI inference endpoints
│       └── webhooks/route.ts    # Clerk, Stripe, etc. webhooks
│
├── components/                  # React components (UI primitives)
│   └── ui/                      # shadcn/ui components
│
├── lib/                         # Application logic
│   ├── ai/                      # Foundation-model client wrapper
│   ├── db/                      # Supabase client + schema types
│   ├── auth/                    # Clerk helpers
│   └── utils.ts                 # Shared utilities
│
├── prompts/                     # Prompt templates as .md files (version-controlled)
│   ├── system-tutor.md
│   └── teacher-review.md
│
├── supabase/                    # Database migrations
│   ├── migrations/
│   └── seed.sql
│
├── docs/                        # Internal team documentation
│   ├── architecture.md          # Architecture diagram + reasoning
│   ├── conventions.md           # Coding conventions
│   ├── deployment.md            # Deployment runbook
│   └── decisions/               # Architecture decision records (ADRs)
│
└── tests/                       # Test files (added Week 5+)

The structure is a Next.js convention with one addition: prompts/ as a first-class directory. Putting prompts in version control (rather than as string literals in code) makes them reviewable, diffable, and stable across team members.

22.4.4 Foundation-model client wrapper pattern

A thin abstraction over the foundation-model API that lets the rest of the app swap models without rewriting:

// lib/ai/client.ts

import { generateText, generateObject } from 'ai';
import { anthropic } from '@ai-sdk/anthropic';
import { openai } from '@ai-sdk/openai';

const MODEL_REGISTRY = {
  'claude-sonnet': anthropic('claude-sonnet-4-6'),
  'claude-opus':   anthropic('claude-opus-4-7'),
  'gpt-5':         openai('gpt-5'),
  'gpt-mini':      openai('gpt-5-mini'),
};

type ModelKey = keyof typeof MODEL_REGISTRY;

export async function generateExplanation(
  promptInput: { question: string; level: string; language: 'BM' | 'EN' },
  modelKey: ModelKey = 'claude-sonnet'
) {
  const systemPrompt = await loadSystemPrompt('tutor-spm');

  const { text, usage } = await generateText({
    model: MODEL_REGISTRY[modelKey],
    system: systemPrompt,
    prompt: buildUserPrompt(promptInput),
    maxTokens: 1500,
  });

  await persistInference({
    modelKey, promptInput, output: text, usage,
  });

  return { text, usage };
}

Three properties of the pattern matter:

Model selection is a function argument, not hardcoded. Switch from Claude Sonnet to GPT-5 for an A/B test by passing a different modelKey.
System prompt is loaded from a file, not inline. Prompts are reviewed via PR like any other code.
Every inference is persisted to the database, with input, output, model, and usage. This is your evaluation infrastructure under construction.

22.4.5 Database schema template for typical AI MVP

A starter schema for a multi-tenant AI MVP with users, organisations, and AI inferences. Adapt to the specific domain:

-- Organisations (tenants — for Team Aroma, "centres")
create table organisations (
  id uuid primary key default gen_random_uuid(),
  name text not null,
  created_at timestamptz default now(),
  metadata jsonb default '{}'::jsonb
);

-- Users (clerk-managed; we mirror identity into our DB)
create table users (
  id uuid primary key, -- matches Clerk user_id
  email text unique not null,
  organisation_id uuid references organisations(id),
  role text check (role in ('owner', 'teacher', 'student')),
  created_at timestamptz default now(),
  metadata jsonb default '{}'::jsonb
);

-- AI inferences (the moat-building log)
create table ai_inferences (
  id uuid primary key default gen_random_uuid(),
  user_id uuid references users(id),
  organisation_id uuid references organisations(id),
  feature_key text not null,           -- e.g. 'tutor-explanation'
  model_key text not null,              -- e.g. 'claude-sonnet'
  prompt_version text not null,         -- e.g. 'tutor-spm-v3'
  input_data jsonb not null,
  output_text text,
  output_data jsonb,
  input_tokens int,
  output_tokens int,
  cost_usd numeric,
  latency_ms int,
  created_at timestamptz default now()
);

-- Domain tables follow (questions, attempts, reviews, etc.)

-- Row-level security: every table has policies like:
alter table ai_inferences enable row level security;

create policy "Users can read inferences from their organisation"
on ai_inferences for select
using (organisation_id = (
  select organisation_id from users where id = auth.uid()
));

create policy "Service role can write inferences"
on ai_inferences for insert
with check (true); -- Server-side writes only; client cannot insert directly

The schema is opinionated:

Organisations are first-class. Even if your MVP looks single-user, the multi-tenancy primitive is built in from the start; retrofitting later is painful.
Users mirror Clerk. Clerk owns identity; your DB owns the application data, with user.id being the Clerk-assigned UUID.
Every AI inference is logged. This single table will be the input to your Week-5 evaluation work and your Week-7 data flywheel.
RLS is enabled on every table by default. Any forgetting of RLS becomes a security review item.

22.4.6 The vertical-slice task breakdown template

For the Week-4 vertical slice, decompose into 6–10 tasks, ordered for parallel execution:

VERTICAL SLICE: [feature name]
Slice scope: one [user] does one [action] end-to-end.

Task 1: [Auth path] — user signs in with role X
  Owner: [name]
  Estimated time: [hours]
  Dependencies: env setup complete

Task 2: [Database] — create the necessary tables and RLS policies
  Owner: [name]
  Estimated time: [hours]
  Dependencies: Supabase project provisioned

Task 3: [API endpoint] — handle the user action
  Owner: [name]
  Estimated time: [hours]
  Dependencies: tasks 1, 2

Task 4: [Model integration] — call foundation model with appropriate prompt
  Owner: [name]
  Estimated time: [hours]
  Dependencies: task 3

Task 5: [Persistence] — save inference and result
  Owner: [name]
  Estimated time: [hours]
  Dependencies: tasks 2, 4

Task 6: [UI] — the screens for this user flow
  Owner: [name]
  Estimated time: [hours]
  Dependencies: task 1

Task 7: [Integration] — wire UI to API to model to DB
  Owner: [name]
  Estimated time: [hours]
  Dependencies: tasks 3, 4, 5, 6

Task 8: [Acceptance test] — manual end-to-end test
  Owner: all team members
  Estimated time: 30 min
  Dependencies: task 7

The pattern: parallelise where possible (UI on the front end, DB schema on the back), serialise where dependencies require (integration must wait for components). For a 5-person team the slice should complete in 3–4 working days with parallel effort.

22.4.7 Cross-campus async workflow guide

Specific patterns that make KL-Melbourne async work well:

Daily standup as written async update (not synchronous video call). Each team member posts 3 lines to the team Slack/Discord channel by 11am their local time: what I did yesterday; what I’m doing today; what I’m blocked on. The format is rigid; the discipline is what makes it useful. Reading the standups takes 5 minutes; the visibility prevents drift.

Twice-weekly synchronous meetings. Tuesday at 9am KL / 12pm Melbourne (planning meeting); Friday at 4pm KL / 7pm Melbourne (demo + retrospective). Both within the 5-hour overlap window. Alternate which campus has the more inconvenient hour for non-meeting work.

Async-first decision making. Decisions are made in writing, in the team docs (Notion / Linear). A team member proposes; others have 24 hours to comment; if no objection, the decision is final. Synchronous meetings are for clarification, not decision-making.

Code review with timezone-aware turnaround. A PR opened in KL morning (and not blocking) gets reviewed in Melbourne morning (KL afternoon). A PR opened in Melbourne afternoon (and not blocking) gets reviewed in KL morning the next day. The pattern lets PRs flow continuously without anyone waiting.

Blocking-request escalation channel. A separate Slack/Discord channel for “I am blocked and need response within 4 hours.” The channel has aggressive notifications turned on for all members; non-blocking traffic stays in the main channel.

Production pager (or its absence). No-one is on call after their working hours. If production breaks at 11pm KL, it stays broken until 9am Melbourne. The discipline is healthy and prevents burnout; for a student MVP, the on-call cost is not justifiable.

22.5 Worked example — Team Aroma’s Week 4

Team Aroma starts the Pulse build on Monday with the MVP scoping document from Week 3 as their contract. The team’s working agreement: Aliyah (CEO; full-time) is the Week-4 infrastructure lead and main backend engineer; Wei Hao (CTO; full-time) is the model-integration and database lead; Sara (Head of Design; ~25 hours/week) is the UI/UX lead; Daniel (Melbourne; Head of Curriculum; ~20 hours/week) owns the prompt content and SPM-rubric verification; Priya (Melbourne; Head of Content; ~20 hours/week) handles documentation and curriculum content sourcing.

Day 1 (Monday): provisioning

Aliyah spends the morning provisioning team accounts. By 1pm KL she has:

GitHub organisation team-aroma-pulse with a private repo, branch protection on main, and all 5 members as members.
Vercel team account with the Hobby plan.
Supabase project in ap-southeast-1 (Singapore) — closer-latency for both KL and Melbourne than ap-southeast-2.
Clerk application in development mode.
Anthropic API key with USD 50 credit (paid tier) — they discover later this barely covers Week 4 due to the prompt-iteration calls.
PostHog free tier.
Notion workspace, Linear project, Slack workspace.

By 2pm Melbourne (5pm KL) Daniel and Priya have access to all of the above and have run git clone successfully. Wei Hao has installed the Vercel CLI and the Supabase CLI on his local machine and confirmed he can deploy. Sara has Cursor installed and configured.

The Day-1 stumble: Daniel tries to use his personal Anthropic account credentials and is rate-limited within 30 minutes. The team forgot to give him the team credit-card-funded shared key. The fix is 5 minutes once identified, but the lost time is 2 hours.

Day 1 evening: Lovable scaffolding

Wei Hao opens Lovable and prompts:

“Build a Next.js app for a Malaysian SPM tutoring platform. Three user roles: centre owner, teacher, and student. Centre owners manage their teachers and students. Teachers see assigned students, generate AI explanations for SPM Add Maths questions, review and edit explanations before sending to students. Students receive practice questions with explanations and submit attempts. Use Tailwind, shadcn/ui, and Clerk for auth.”

Lovable produces a deployable Next.js 14 scaffold in 5 minutes. Wei Hao iterates the prompt 4 times to refine the dashboard layouts, then exports to GitHub. The auto-deploy to Vercel completes; the preview URL works. Wei Hao posts the URL to Slack at 9pm KL; Daniel and Priya can both load it from Melbourne.

Day 2 (Tuesday): auth and DB foundation

Wei Hao wires Clerk: @clerk/nextjs installed, sign-in and sign-up routes configured, the protected dashboard wrapper applied. By Tuesday lunchtime, sign-up works end-to-end. The Tuesday morning sync meeting (9am KL / 12pm Melbourne) is 25 minutes: confirm provisioning is complete, confirm Lovable scaffold accepted, plan Wednesday’s database work.

Wei Hao spends Tuesday afternoon on the Supabase schema. He drafts the schema on a Miro board with Daniel (over screen-share). Tables: centres, users, students_at_centre, questions, student_attempts, ai_explanations, teacher_reviews. RLS policies designed: a centre’s data is visible only to that centre’s users; cross-centre queries are blocked at the database layer.

The schema goes through a 30-minute review with the whole team (Tuesday 4pm KL / 7pm Melbourne, the second sync meeting of the week, scheduled here as an exception). Two changes from the review: Daniel suggests adding topic_code to questions for SPM-curriculum alignment; Sara suggests a display_state field for soft-deletes in the UI. Schema commit at 8pm KL; migration applied to Supabase.

Day 3 (Wednesday): foundation-model integration

Wei Hao integrates the Anthropic SDK via the Vercel AI SDK. The first prompt template (prompts/tutor-explanation-v1.md) is drafted by Daniel and Priya — Daniel writes the SPM-rubric requirements, Priya writes the bilingual BM/English instructions. The prompt is reviewed by Aliyah for tutoring-quality realism.

The first round-trip test: Wei Hao runs a sample SPM Add Maths question through the model. The output is plausible but the BM section uses formal language unsuited for Form-5 students. Daniel iterates the prompt to add casual-but-accurate Malaysian vocabulary; the second round produces output Aliyah judges acceptable.

By Wednesday evening, the model client is wired into the API route, every inference is being logged to Supabase, and the cost of a typical explanation is ~USD 0.04. PostHog is also wired; every page view and API call is tracked.

The Wednesday stumble: at ~10pm KL, Wei Hao hits the Anthropic rate limit (3 requests per minute on the introductory paid tier). He spends an hour debugging before realising the issue and upgrading the tier (USD 100 monthly cap). Loss: ~1 hour of debugging time. The lesson is documented in the team’s docs/decisions/001-anthropic-tier.md.

Day 4 (Thursday): the first vertical slice

Sara has worked through Wednesday on the UI screens (teacher dashboard, student question view, teacher review interface). By Thursday morning the screens are visually correct but not wired to real data. The team’s Thursday goal: complete the vertical slice — one teacher logs in, sees one assigned student, generates one practice question with AI explanation, reviews it, sends to the student, and the student sees it.

The team works in parallel:

Aliyah implements the API endpoint that, given a teacher’s request, generates an AI explanation and persists it to ai_explanations.
Wei Hao adapts the data-layer queries to feed the dashboard with the right student-and-question lists.
Sara wires the UI components to the API via React Query.
Daniel seeds the database with 5 SPM Add Maths questions (taken from public 2024 SPM trial papers).
Priya seeds 1 centre, 1 teacher, and 2 students with realistic-looking data.

By Thursday 6pm KL, Aliyah opens a fresh Chrome window. She signs up as a teacher, gets assigned to the seeded centre via a manual database update, sees the seeded students, clicks “Generate explanation” on a question, sees the AI output appear after ~3 seconds, edits one phrase, clicks “Send to student,” opens an incognito window, signs up as a student, sees the explanation. The vertical slice works.

The Thursday-evening team chat captures the moment: “Slice ✅” from Aliyah; celebratory GIFs from Sara and Daniel.

Day 5 (Friday): demo, gap analysis, Week-5 planning

The Friday demo (4pm KL / 7pm Melbourne) is 45 minutes. Aliyah drives:

Demonstrates teacher sign-up, login, dashboard.
Generates explanation for one question; reviews and edits; sends to student.
Switches to student view; sees the explanation; submits an attempt.
Switches back to teacher view; sees the attempt and the (very basic) auto-generated feedback.

The team identifies gaps:

No invite flow. Currently students and teachers sign up separately and Aliyah manually links them via the DB. Centre-owner-mediated invites are needed by Week-5 end.
No SPM-format checking. The AI generates explanations but the rubric-format validation layer (a Must-have from MoSCoW) is not implemented.
No batch question generation. The AI generates one explanation at a time; teachers will want to assign 5–10 questions at once.
No mobile-responsive testing. The team has tested only on desktop; many students will use mobile.
No production hardening. Error handling, retries, edge cases. Acceptable for the slice; not acceptable for alpha.
No analytics dashboard for the team. PostHog is collecting data but no view aggregates it. A team-internal dashboard is needed for Week-5 evaluation work.

The Week-5 plan converts the gaps to specific tasks; Linear tickets are created for each. The team also notes that 1 of the team’s 50 USD Anthropic credit has been spent in Week 4 alone (~USD 30), implying ~USD 60 will be needed for Weeks 5–7. The team commits a USD 100 paid tier from Aliyah’s personal credit card, with the team agreeing to reimburse from any pilot revenue.

The Friday submission goes in at 11pm KL: the repo URL, the Vercel preview URL, the architecture diagram (Sara drew it in Excalidraw), the stack decision document, the build velocity tracker (showing 30 dev-days of effort across 5 members in the week, slightly under the planned 35), the cost log, and the updated risk register.

What Team Aroma got right and what they almost got wrong

Three things they did well: (1) provisioning was completed on Day 1 despite the personal-Anthropic-key stumble, which prevented Week-long blocking; (2) the schema review on Tuesday afternoon caught two missing fields (topic_code, display_state) before they became expensive to add; (3) the vertical-slice discipline meant Thursday’s celebration was on a working end-to-end demo, not on disconnected component completion.

Three things they almost got wrong: Wei Hao initially scoped Wednesday for “build the full prompt library” (which is a Week-5 task); the team almost agreed to build the SPM-format-checking layer in Week 4 (which would have made the slice impossible to complete on time); and Sara almost spent Thursday on responsive-mobile work (which she demoted after the team agreed alpha would be desktop-first). Each of these would have produced a Frankendemo by Friday — components in various states of partial completion, none integrated. The team’s MoSCoW discipline from Week 3 — keeping the Must list to 8 features — saved the integration work.

22.6 Course exercises and Week 4 deliverable

Submit the Week 4 deliverable bundle as a shared folder by Friday 23:59. Required artefacts:

22.6.1 Required artefacts

Repository URL with a complete README documenting setup, environment variables, and deployment.
Stack decision document — one to two pages listing the chosen tools at each layer with rationale (per §22.4.1 and §22.4.2).
Architecture diagram — one page (Excalidraw, draw.io, or whiteboard photo is fine) showing the major components and their connections.
Live working prototype URL — Vercel preview URL with the vertical slice deployed.
Vertical slice demo — a 2–4 minute screen recording showing the slice end-to-end (uploaded as a Loom or YouTube unlisted link).
Build velocity tracker — a simple sheet showing dev-days planned vs spent per task, with explanations for variances >25%.
Cost log — actual spending across all tools (Vercel, Supabase, Clerk, Anthropic, PostHog) for the week.
Updated risk register from Chapter 21 — risks closed, risks newly identified, mitigations updated.

22.6.2 Grading rubric (50 points)

Component	Points	Distinction-level criteria
Stack decision rationality	10	Tool choices are defensible against §22.4.2 decision tree; build vs buy/borrow ratio supports the wedge
Vertical slice depth	15	One feature works end-to-end through every layer; demo shows real persistence and real model output
Documentation discipline	5	README is sufficient for a new team member to set up; ADRs documented; conventions clear
Build velocity	5	Dev-days spent within ±25% of planned; variances explained
Cross-campus coordination	5	Twice-weekly sync meetings held; async standup discipline maintained; PRs reviewed across campuses within 24 hours
Cost discipline	5	Spending tracked; tier upgrades anticipated; paid-tier provisioned before alpha
Risk management	5	Risks updated weekly; new risks (production hardening, rate limits, cost cliffs) identified

Pass: 30. Credit: 36. Distinction: 42. High Distinction: 47.

The team-comprehension penalty from §19.6.2 applies; additionally, every team member must be able to demonstrate their own commits and explain their own code in any random spot-check.

22.6.3 Things to do before Monday of Week 5

By Sunday evening of Week 4, in addition to the deliverable submission:

Identify the 100 representative inputs that will form your Week-5 evaluation set (the “golden set” in Chapter 23). For Team Aroma, this is 100 SPM Add Maths questions across the major topic areas.
Confirm the alpha-cohort booking from Week 2: the friendly users committed to Week-6 alpha testing should be re-confirmed via personal message before Sunday. Five-week-old commitments evaporate without re-confirmation.
Read Chapter 4 (Building the AI-native enterprise) and §23.1–§23.3 of Chapter 23 (Evaluation, golden sets, and build-measure-learn) before Monday of Week 5. The evaluation framework you build in Week 5 is the discipline that distinguishes a working MVP from a defensible MVP.

References for this chapter

Lean and agile development practice

Beck, K. and others (2001). Manifesto for Agile Software Development. agilemanifesto.org.
Humble, J. and Farley, D. (2010). Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation. Addison-Wesley.
Forsgren, N., Humble, J., and Kim, G. (2018). Accelerate: The Science of Lean Software and DevOps. IT Revolution Press.

Modern web and AI infrastructure

Vercel (2024–2026). Vercel Platform Documentation and Vercel AI SDK Documentation. vercel.com.
Anthropic (2024–2026). Claude API Documentation and Model Context Protocol Specification. docs.anthropic.com, modelcontextprotocol.io.
Supabase (2024–2026). Supabase Documentation. supabase.com.
Clerk (2024–2026). Clerk Documentation. clerk.com.

Foundation-model providers and inference services

OpenAI (2024–2026). OpenAI Platform Documentation. platform.openai.com.
Google (2024–2026). Gemini API Documentation. ai.google.dev.
DeepSeek-AI (2024). DeepSeek-V3 Technical Report. arXiv:2412.19437.
DeepSeek-AI (2025). DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. arXiv:2501.12948.
Together AI, Fireworks AI, Groq, Modal Labs, Replicate (2024–2026). Provider documentation.

Cases referenced in §22.3

Iansiti, M. and Lakhani, K. R. (2020). Competing in the Age of AI: Strategy and Leadership When Algorithms and Networks Run the World. Harvard Business Review Press. (JPMorgan COiN, Stitch Fix, DBS GANDALF, Watson Health.)
Lamarre, E., Smaje, K., and Zemmel, R. (2023). Rewired: The McKinsey Guide to Outcompeting in the Age of Digital and AI. Wiley. (DBS standardised tooling.)
Anysphere (Cursor) public technical and engineering blog posts (2023–2026).
Anthropic (2024). Model Context Protocol — Introduction and Specification. November 2024.

Stack-selection literature

Spolsky, J. (2001). The Joel Test: 12 Steps to Better Code. Joel on Software.
Atwood, J. (2007). Programming Atomic Habits. Coding Horror Blog.
Latent Space (Wang, S., 2023–2026). Newsletter and podcast on AI engineering practice.
AI Engineer Summit (2023–2026). Conference proceedings. ai.engineer.