Chapter 22 — Week 4: The 2026 low-code stack
Welcome to Week 4. The MVP scoping document from Week 3 is your contract; this week you turn it into running software. By Friday you will have a deployed prototype with end-to-end auth, database, foundation-model integration, and at least one vertical slice of the wedge feature working from user input through model inference to persisted output. The single most-common Week-4 mistake is to build layers (auth done; database done; model integration done) without ever connecting them. By Friday a real user — a team member counts as a real user — should be able to log in, do the wedge action, and see a result. This chapter exists primarily to make sure that happens.
Chapter overview
This chapter follows the same six-part structure. §22.1 (Concept) sets out the four-layer 2026 low-code AI stack, the build-vs-buy boundary applied to actual tools, where R and Python fit (and where they do not), the hidden costs of low-code, and the team workflow for a 4–5 person cross-campus build. §22.2 (Method) is the day-by-day Week 4 sprint: environment setup, app-shell-and-auth, data layer, foundation-model integration, the first vertical slice, and the Friday demo. §22.3 (Lessons from the cases) pulls eight specific lessons on stack and tool selection from Parts I–III. §22.4 (Tools and templates) is the most operationally dense section in this book — the current 2026 reference for app builders, hosting, databases, auth, foundation-model APIs, vector stores, agent frameworks, observability, and payments, with KL- and Melbourne-specific notes. §22.5 (Worked example) continues Team Aroma through their actual Week 4 build, including the rate-limit incident that delays a teammate by a day and the cross-campus integration meeting they almost botched. §22.6 (Course exercises and deliverables) specifies the Week 4 submission with grading rubric.
How to read this chapter. Read §22.1 in full at Monday morning’s standup. Read §22.4 (the stack reference) before you make any tool-procurement decisions; the wrong tool selected on Monday compounds into days of lost work by Wednesday. Use §22.2 as the day-by-day plan; assign owners per step. Treat §22.3 as Wednesday-evening reading after the foundations are in place. Submit against §22.6 by Friday 23:59.
22.1 Concept
22.1.1 The four-layer 2026 low-code AI stack
A modern AI MVP is best understood as four layers, each with characteristic tooling. The architectural separation matters because each layer has its own commoditisation curve, its own cost structure, and its own switching cost — and because the Iansiti-Lakhani thesis (Chapter 3) tells us the locus of competitive advantage is across layers, not within any one of them.
+----------------------------------------------------------+
| Layer 4 — App shell and UX |
| (Lovable, v0, Bolt, Replit Agent, Cursor, Windsurf) |
+----------------------------------------------------------+
| Layer 3 — Intelligence |
| (Anthropic, OpenAI, Google, DeepSeek; Vercel AI SDK, |
| LangChain/LangGraph, Mastra, Pydantic AI, MCP servers) |
+----------------------------------------------------------+
| Layer 2 — Data |
| (Supabase, Neon, Convex, Firebase; pgvector, Pinecone, |
| Turbopuffer, Weaviate) |
+----------------------------------------------------------+
| Layer 1 — Integration and infrastructure |
| (Vercel, Railway, Render, Cloudflare; Clerk auth; |
| PostHog analytics; Stripe / iPay88 payments) |
+----------------------------------------------------------+
Layer 1 — Integration and infrastructure is the substrate: hosting, auth, observability, billing. Almost entirely commoditised in 2026; the practical differences across vendors are developer experience and pricing curves, not capability. Default choice for student teams: Vercel + Clerk + PostHog free tier + Stripe for international or Billplz/iPay88 for Malaysia-only payments. Setup effort ≤1 day; ongoing operational cost AUD/MYR ~0 in pilot phase.
Layer 2 — Data is the persistence and retrieval layer: relational data, document storage, vector embeddings for RAG, real-time sync if needed. Commoditised at the storage level; the moats here are at the schema and retrieval pipeline level, not at the database level. Default choice: Supabase (Postgres + pgvector + auth + storage + realtime + edge functions in one platform). Setup effort ≤1 day for a typical student MVP schema.
Layer 3 — Intelligence is the foundation-model and orchestration layer. The DeepSeek shock (Chapter 5) confirmed that frontier capability is commoditising; the moats here are at the prompt-engineering, evaluation, and workflow-integration levels rather than at the model level. Default choice for a B2B/education MVP in the KL or Melbourne context: Anthropic Claude Sonnet via the Anthropic API, called through the Vercel AI SDK if you are on the JavaScript/TypeScript stack, or the official Anthropic Python SDK if you have R/Python components. Set up the model client behind a thin abstraction so swap to GPT, Gemini, or DeepSeek is a one-line change.
Layer 4 — App shell and UX is the customer-facing surface: pages, forms, dashboards, mobile-responsive layouts, brand. The 2024–2026 wave of AI app builders (Lovable, v0, Bolt, Replit Agent) collapsed this layer’s build cost by an order of magnitude; a working app shell that would have required 3–5 days of front-end engineering in 2022 takes 2–4 hours in 2026. Default choice for student teams: Lovable for the initial scaffold (best at producing complete, deployable Next.js apps from prompts), with Cursor as the day-to-day IDE for refinement. v0 and Bolt are reasonable alternatives; Replit Agent is the strongest choice if your stack has Python components.
The principle that organises the stack is the same principle from Chapter 21: build the wedge, buy or borrow everything else. The four-layer view makes the principle operational. Layers 1, 2, and 4 are almost entirely buy/borrow for a student MVP. Layer 3 is mixed — the model is buy, but the prompts, the RAG pipeline, the evaluation harness, and the workflow integration are build. The build effort in Week 4–6 should concentrate inside Layer 3’s application-specific parts, with everything else assembled from existing tools.
22.1.2 The build-vs-buy boundary, revisited
Chapter 21’s build/buy/borrow framework gave you the principle. This week applies it to actual tool choices, with the caveat that the framework’s outputs are heavily team-dependent. A team with a strong front-end engineer might build the app shell from scratch using Next.js + Tailwind, treating Lovable as too constraining. A team with no front-end engineer should treat Lovable as essential. The framework is not “always buy”; it is “build only the wedge.”
A useful diagnostic: walk through your MoSCoW list from Chapter 21 and ask, for each Must feature, if this feature were a commodity tool we could buy, would we still have a startup? The answers cluster into three buckets:
- Yes, definitely — the feature is necessary infrastructure (auth, hosting, payments, basic UI components) but not what makes your product distinct. Buy or borrow.
- No — the feature is your product (the wedge AI capability, the workflow integration with the customer’s existing systems, the proprietary data pipeline). Build.
- Maybe — the feature is differentiating but the differentiation is small (custom UI components, specific analytics dashboards, branded email templates). Buy or borrow now; revisit at Week 8 if it has become a customer-cited differentiator.
The pattern that emerges, for almost every MVP, is that 2–4 features are firmly Build, 8–12 features are firmly Buy/Borrow, and 2–4 features sit in the Maybe bucket. The 2–4 Build features must absorb roughly 70–80% of the team’s engineering effort over Weeks 4–6. If your time allocation does not match this, you are over-investing in commoditised infrastructure at the expense of the wedge.
22.1.3 Where R and Python fit — and where they do not
The course is configured for low-code preference, with R and Python as exceptions rather than the default. Three classes of work justify the R/Python overhead:
Custom evaluation pipelines. Foundation-model output evaluation at scale — running 1,000 test inputs through the model, scoring against golden answers, computing aggregate metrics — is much easier in Python (with the OpenAI Evals library, the Anthropic evaluation framework, or LangSmith) than in any low-code platform. Chapter 23 develops this in detail. For the Week-4 build, evaluation is typically deferred; for Weeks 5–6, a Python evaluation script is often the right choice.
Custom embedding and RAG pipelines. Where your RAG corpus is large (10,000+ documents), domain-specific (Malaysian commercial law, SPM past-year papers, healthcare procedures), or sensitive (data must not leave a sovereign perimeter), you may need a custom embedding pipeline. Python with sentence-transformers, langchain, or llama-index is the standard stack. R is rarely the right tool for embedding pipelines.
Statistical or econometric analysis. For business-analytics, fintech, and economics-adjacent MVPs (which include several plausible Monash student-team domains), R and Python are first-class tools. The seminr package in R for PLS-SEM, the tidyverse for data manipulation, statsmodels and scipy.stats in Python — none of these have low-code equivalents. If your MVP’s wedge involves statistical modelling beyond what foundation models can reliably do, build the modelling layer in R or Python and expose it to your low-code app shell as an HTTP API (see §22.2.5 for the integration pattern).
Fine-tuning. If your evaluation in Week 5–6 reveals that no foundation model is good enough for your task and you need a fine-tuned model, the fine-tuning workflow lives in Python (typically through Hugging Face’s transformers and peft libraries, or through hosted fine-tuning services like Together AI’s fine-tuning API). For 95% of student MVPs, fine-tuning is not needed; prompt engineering and RAG cover the gap.
The integration pattern when R/Python is needed. Do not try to build the entire app in Python if only one component needs it. Instead: build the app shell and main backend in JavaScript/TypeScript on Vercel, build the R/Python component as a separate microservice (deployed on Modal Labs, Replicate, Hugging Face Inference Endpoints, or Railway), and call the microservice over HTTP from the main backend. The pattern keeps the app build fast while letting you escape into R/Python where it genuinely helps.
A specific note on R for student teams: R is excellent for analysis and reports but underpowered as an application platform. For econometric MVPs, the typical pattern is to build the modelling layer in R (using seminr, BEKKs, vars, quantmod, or similar packages familiar from finance and statistics courses), expose the model behind an plumber API, and integrate from a TypeScript front-end. Resist the temptation to build the front-end in Shiny; for production-grade MVPs targeting external users, Shiny’s UX ceiling will frustrate your team within 2–3 weeks.
22.1.5 Working in a 4–5 person cross-campus team
The team-workflow question is non-trivial for a 4–5 person team building software for the first time, especially with members across two timezones. The default operating model:
Source control: GitHub, single repo, monorepo or polyrepo by team preference. Single-repo (monorepo) is the default for a 5-person 10-week project — coordination overhead of polyrepo is rarely worth it at this scale.
Branch protection on main. Require pull requests with at least one review before merge. Direct pushes to main are blocked except by the team lead in genuine emergencies. The discipline prevents the most common student-team failure mode: a member force-pushes a broken commit on Friday evening and breaks the team’s environment for the weekend.
Branching convention. main is always deployable. Each feature branch is feat/<short-name> (e.g., feat/teacher-review-ui); each fix is fix/<short-name>; each refactor is refactor/<short-name>. Feature branches live for 1–3 days at most; longer-lived branches accumulate merge-conflict pain.
Pull request discipline. Every PR has: a linked task ticket (from Linear, GitHub Issues, or Notion), a description of what changed, screenshots or screen-recording for UI changes, and at least one peer reviewer. Review turnaround target: <24 hours, with the cross-campus rotation respecting both timezones.
Code review at student-team scale. Reviews are not full audits — they are sanity checks. The reviewer reads the changes, runs the code locally if possible, and asks: does this break anything obvious? Does this match our conventions? Is the test coverage proportional to the risk? A 5-minute review is normal; a 30-minute review is a signal that the PR is too large and should have been split.
Deployment cadence. Vercel auto-deploys preview environments per PR (free); production deploys on merge to main. The team should ship to production several times per day after Week 4 has stabilised — large infrequent deploys produce more bugs than small frequent ones.
Documentation discipline. Every component, API endpoint, and database table is documented in the repo’s README or docs folder as it is built, not retroactively. Cross-campus teams without strong documentation discipline diverge quickly because synchronous communication windows are limited.
Async-first communication. With KL and Melbourne working hours overlapping ~5 hours per day (roughly 9am–2pm KL / 12pm–5pm Melbourne in the standard semester), most communication must be async. Slack, Discord, or Notion is the substrate; team-defined response-time expectations (~24 hours for normal requests, ~4 hours for blocking requests) keep async working.
Two synchronous meetings per week. From Chapter 19’s founder agreement: typically Tuesday and Friday, with the meetings alternating which campus’s working hours are inconvenient. The Tuesday meeting reviews the prior week and plans the current; the Friday meeting reviews progress and prepares the deliverable submission.
The pattern is well-tested: distributed engineering teams have worked this way for 15+ years, and student teams that adopt the pattern outperform student teams that try to be fully synchronous (which typically fails by Week 4 due to timezone fatigue) or fully async (which typically fails by Week 5 due to coordination failure).
22.2 Method — the Week 4 sprint
22.2.1 Day 1 (Monday): environment setup and provisioning
By Monday end-of-day every team member should have working access to:
- The shared GitHub organisation and repo (with branch protection enabled).
- The shared Vercel team (free tier; Hobby plan is sufficient).
- The shared Supabase project (free tier).
- The shared Clerk application (free tier).
- The shared Anthropic API key, with the paid tier provisioned (~USD 20 minimum to avoid free-tier rate limits).
- The shared PostHog account (free tier).
- The shared Notion / Google Drive / Linear workspace for documentation and task tracking.
- A shared development environment file (
.env.example) committed to the repo, with all required environment variables listed (no actual secrets — those live in Vercel/Supabase/Clerk dashboards and in each developer’s local.env.localfile).
The provisioning is mechanical but error-prone. Allocate one team member as the infrastructure lead for the day; they own getting everyone provisioned, and have authority to escalate when other team members are blocked on access.
A common Day-1 mistake: each team member uses their personal Vercel/Supabase/Clerk account, with the project owned by one member. When that member’s free tier is hit, the whole team is blocked. Always provision team-level accounts on each platform from the start; the per-platform setup overhead is 15–30 minutes per platform.
22.2.2 Day 1–2: app shell and authentication
By Tuesday end-of-day the team should have a deployed app at a Vercel preview URL with working authentication. Method:
- Scaffold via Lovable. Open Lovable; describe the app in 3–5 sentences (the role of each user type, the primary screens, the brand). Lovable produces a Next.js 14+ app with Tailwind, shadcn/ui components, and TypeScript. Iterate the prompt 2–3 times until the scaffold is roughly right.
- Export to GitHub. Lovable connects to GitHub directly; commit the scaffold to your
mainbranch. - Deploy to Vercel. Vercel auto-detects the Next.js app; the first deploy completes in 2–5 minutes.
- Wire Clerk. Add
@clerk/nextjsto dependencies; follow the Clerk Next.js quickstart (15-minute walkthrough). Configure the Clerk dashboard with appropriate user types if you have multiple roles (e.g., teacher, student, centre owner). - Verify the round-trip. Sign up as a test user, log in, see the protected dashboard. If this works, the app shell is complete.
The 5-step sequence is achievable in 4–6 hours by a team without prior Next.js experience. Teams with prior experience complete it in 2–3 hours. Watch for the common failures: Clerk webhook secrets not set in Vercel (auth events fail silently); environment variables missing on the Vercel side after local setup (preview deploys fail); Lovable’s default styling overriding shadcn (cosmetic but disorienting).
22.2.3 Day 2–3: data layer
By Wednesday end-of-day the team should have a Supabase schema deployed, basic CRUD operations working from the app, and seed data loaded for testing. Method:
- Schema design. Sketch the schema on a whiteboard or Miro before writing any SQL. Identify the entities (users, teachers, students, centres, questions, attempts, ai_explanations, teacher_reviews — for Team Aroma’s Pulse). Identify the relationships (foreign keys). Identify what is row-level-security gated (almost everything in a B2B app where centres should not see each other’s data).
- Create Supabase tables. Use the Supabase dashboard’s SQL editor or the Supabase CLI for version-controlled migrations. Migrations are checked into the repo so the team can recreate the database.
- Set up row-level security (RLS) policies. Critical for any multi-tenant app. RLS policies enforce, at the database level, that a user can only read/write their own organisation’s data. Without RLS, a security failure in the application layer becomes a data leak.
- Wire pgvector if needed. For RAG-using apps (most education and knowledge-work MVPs), enable pgvector and add the embedding columns to the relevant tables.
- Seed test data. A 10–20-row seed dataset that lets the team test workflows end-to-end. Seed data lives in a script committed to the repo so any team member can rebuild their local Supabase to the standard state.
The data layer is the substrate everything else depends on; underinvestment here produces compounding pain through Weeks 5–6. Spend the time to get RLS right.
22.2.4 Day 3–4: foundation-model integration
By Thursday end-of-day the foundation model is integrated, the first prompt template is in production, and the team can generate AI output through the deployed app. Method:
- Set up the API client. Use the Vercel AI SDK if you are on the JavaScript stack (it abstracts over Anthropic, OpenAI, Google, and others uniformly), or the official Anthropic SDK directly if you want tighter control. Both are 5-line setups.
- Define the first prompt template. A typed wrapper around the model call: input parameters, output schema, system prompt, user prompt template. Keep the prompt in version control (a
.mdfile in the repo) so changes are reviewed. - Wire to the database. When the model produces output, persist it to the appropriate table along with the input, the prompt version, the model used, and timestamps. This is your Week-5 evaluation infrastructure under construction; the discipline of persisting every input-output pair is what makes systematic improvement possible.
- Test the round-trip. A user submits an input; the model produces output; the output is persisted; the user sees the output. End-to-end through every layer.
- Set up cost monitoring. Anthropic’s dashboard shows per-day spending; set up a budget alert for ~10× your expected daily spend. The alert catches runaway loops or accidentally infinite recursion early. PostHog or Sentry can also be configured to log every model call for debugging.
A specific prompt-engineering note: keep the system prompt long-lived and the user prompt task-specific. The system prompt encodes the model’s role, constraints, and output format; the user prompt is the per-task content. This separation makes prompt iteration much faster — you can tune the user prompt for one task without disturbing other tasks.
22.2.5 Day 4–5: the first vertical slice
By Friday morning the team should have a complete vertical slice of the wedge feature working from user input through the full stack to persisted output. The vertical slice is the single most important Week-4 outcome.
A vertical slice is a thin end-to-end implementation of one feature that touches every architectural layer — UI, auth, business logic, model inference, persistence, retrieval, output rendering. It is not a horizontal layer (e.g., “all the database tables built but no UI”). It is a single feature done end-to-end at the lowest possible quality bar, with everything else stubbed.
For Team Aroma’s Pulse: the vertical slice is one teacher logs in, sees one student, generates one practice question with AI explanation, reviews the explanation, sends it to the student, and the student sees it. The slice deliberately uses one teacher, one student, one question, with most surrounding features stubbed. Once the slice works end-to-end, Week 5 turns the slice into the production feature by widening: many teachers, many students, many questions, with the surrounding features filled in.
The vertical-slice discipline matters because it forces you to confront integration problems early. A team that builds horizontal layers (database first; then API; then UI; then model integration) typically finds that the layers do not connect cleanly when assembled in Week 6 — and by then, with two weeks until alpha, the integration debugging consumes the build budget. The vertical-slice approach assembles the layers from Day 1 of the build week and irons out the integration issues incrementally.
The R/Python integration pattern, if your wedge requires it: build the R/Python service first as a standalone HTTP endpoint (deployed on Modal, Replicate, or Railway), with one input/output round-trip working. Then call it from your Next.js backend exactly as you would call the Anthropic API. The vertical slice includes the R/Python service in the loop, even if its logic is stubbed for the first slice.
22.2.6 The “Frankendemo” antipattern
A common Week-4 failure mode is the Frankendemo: a collection of disconnected demos that, individually, look impressive, but do not assemble into an end-to-end working product.
Symptoms of the Frankendemo:
- The auth works (you can sign up and log in) but does not connect to the rest of the app — once logged in, you see a static landing page.
- The database has tables and seed data but no UI reads or writes from it.
- The model integration produces output via a debug endpoint but no user flow ever calls the endpoint.
- The team’s Friday demo is a series of separate browser tabs and terminal windows showing each layer independently.
The Frankendemo is the predictable result of horizontal-layer building (§22.2.5). It is also a comforting failure mode: each layer “works” in isolation, so each team member can claim their piece is done. The team only discovers that the pieces do not connect when they try to give a unified demo, which often does not happen until Week 6’s alpha-launch panic.
The vertical-slice discipline is the antidote. By Friday of Week 4 the team should be able to give a single demo, in a single browser tab, of one user doing one workflow end-to-end. If the team cannot, the build is ahead of the integration; pull back and integrate before adding features.
22.2.7 Friday evening: prototype demo and Week 5 plan
The Friday demo is the team’s internal milestone, distinct from the deliverable submission. Its purpose is honest assessment: where are the integration gaps, what was harder than expected, what was easier, and how does Week 5’s plan adjust?
The format:
- One member (rotating) drives the demo — clicks through the vertical slice from start to finish, while the rest of the team watches.
- Two minutes of celebrating what works. Genuinely. The first vertical slice is a real accomplishment.
- Ten minutes of identifying gaps. What broke during the demo, what feels rough, what is faked vs working, what production-grade hardening is missing?
- Twenty minutes of Week 5 planning. Convert the gaps to specific tickets, assign owners, set checkpoint dates.
The Week 5 plan should produce, by next Friday, a full-quality version of the vertical slice (the wedge feature end-to-end at production quality) and the secondary features from MoSCoW Must list. Week 6 then becomes alpha launch with the friendly cohort booked from Week 2.
22.3 Lessons from the cases
Eight specific lessons on stack and tool selection from Parts I–III shape Week 4 decisions.
22.3.1 Cursor — the IDE as competitive advantage (Chapter 5)
Anysphere’s bet that a better IDE-around-the-model could outcompete GitHub Copilot has been substantially vindicated; Cursor’s ARR exceeded USD 1B by late 2025. The bet’s mechanism: the IDE is the workflow integration that Copilot did not invest in. Multi-file context, repo-level awareness, the diff approval flow, the cmd-K inline edit pattern — these are not the model; they are the surface that determines how the model is used.
Operational implication. Your Layer-4 surface (the app shell, the user interface, the workflow design) is where most of your competitive advantage compounds, even when the underlying foundation model is commoditised. Spend disproportionate Week-4 effort on the workflow design, not on the model integration. The model is the API call; the workflow is the product.
22.3.2 Anysphere uses Cursor — “eat your own dogfood” (Chapter 5)
Anysphere’s own engineers build Cursor using Cursor. The dogfooding is not just marketing — it is the primary mechanism by which the team identifies friction in their own product. Every issue that frustrates an Anysphere engineer is a Cursor design defect, identified in real time, with the engineer’s knowledge of the codebase as a debugging asset.
Operational implication. Use your own product. The pre-validation calls in Week 1 captured customer voices; the alpha in Week 6 captures friendly-user voices; in Week 4–5, the team itself is the most accessible user. If a team member would not use the product as it stands, no external user will. The discipline of “would I use this?” should be applied at every design decision through the build.
22.3.3 JPMorgan COiN — internal vs external tooling distinction (Chapter 6)
COiN is internal-facing; the bank’s legal team uses it. The architectural decisions (deep workflow integration with case-management systems; audit trails; role-based access controls) reflect that fact. An external-facing tool (e.g., a contract-review SaaS for law firms) would have different architecture: multi-tenancy, customer-managed data isolation, customer-controlled audit configuration, public-facing security certifications.
Operational implication. Decide explicitly whether your MVP is internal-facing (used inside one customer’s organisation), external-facing-single-tenant (each customer has their own deployment), or external-facing-multi-tenant (one shared deployment serving many customers). The architecture differs substantially across the three. For most student MVPs the answer is external-facing-multi-tenant with strong row-level security, which is what Supabase’s RLS is designed for.
22.3.4 Anthropic and MCP — building agents requires standard tooling (Chapter 13, forthcoming)
Anthropic’s Model Context Protocol (MCP), open-sourced in November 2024, has become the de-facto standard for connecting AI models to data sources and tools. The protocol exists because every agentic system needs the same three things: a way to expose data sources, a way to expose actions, and a way to compose them. Without a standard, every team rebuilds the same plumbing.
Operational implication. If your MVP involves agentic behaviour (the AI takes multi-step action on behalf of the user), use MCP from the start. Anthropic, OpenAI, and increasingly Google support MCP-compatible servers. The Vercel AI SDK has MCP integration built in. Building your own agent-orchestration layer is a Week-7+ optimisation; for Week 4–5 use MCP-native frameworks (LangGraph, Pydantic AI, Mastra) and let the framework handle the agent loop.
22.3.6 Glean — integration depth as the moat (Chapter 5)
Glean’s product is enterprise search; what makes it valuable is the integration depth across Slack, Jira, GitHub, Confluence, Google Drive, Salesforce, Notion, and ~50 other enterprise tools. A new entrant with a similar AI capability but only 5 integrations cannot match Glean’s results. The integrations are the moat, not the model.
Operational implication. If your MVP plays in a multi-system workflow (typical for B2B), the integrations are your moat. Plan integrations as first-class features, not as adjunct work. For Week 4, get one integration working end-to-end; for Weeks 5–6, expand to 3–5 integrations. The integration strategy is also the customer-conversation strategy: each integration is a reason for a customer to commit (and a reason for a competitor to find your moat hard to bypass).
22.3.7 DBS GANDALF — choosing standardised tooling and the in-source shift (Chapter 4)
DBS’s transformation included a deliberate shift from outsourced systems-integrator work to in-house engineering with standardised tooling (cloud-native, microservices, automated CI/CD). The standardisation was not a constraint; it was an enabler. By committing to a small number of tools done well, the bank’s engineers could move between projects fluidly and the operational overhead of running production fell.
Operational implication. Resist the temptation to use 15 different tools because each is “best in class” for its niche. The integration tax exceeds the per-tool benefit at student-team scale. The default 2026 stack proposed in §22.4 covers 90%+ of student MVPs with 8–10 tools; deviate only when a specific tool is materially better for your specific use case.
22.3.8 The DeepSeek shock — foundation-model commoditisation (Chapter 5)
The DeepSeek-R1 release on 20 January 2025, MIT-licensed at frontier reasoning capability, demonstrated that foundation-model performance is commoditising rapidly. The implication for AI MVPs: do not architect around a single foundation-model provider. By the time your alpha launches in Week 6, the model landscape will have shifted; by the time you raise a seed round (Year 2 of post-graduation), the lock-in to any specific model will look quaint.
Operational implication. Build the foundation-model client behind a thin abstraction. Use the Vercel AI SDK or a similar abstraction that makes model-swap a configuration change. Test 2–3 models on your task in Week 3 and choose the best one for now; revisit the choice in Weeks 6–7 against actual alpha data. Avoid prompts that depend on specific model idiosyncrasies (e.g., GPT-specific output formatting tokens) that would not transfer.
22.4 Tools and templates
This is the most operationally dense section in the book. The recommendations are current to mid-2026; the tool landscape moves quickly, so verify current pricing and feature sets at procurement.
22.4.1 The 2026 low-code stack reference
| Layer | Category | Default | Strong alternative | Notes |
|---|---|---|---|---|
| 4 | App scaffolding | Lovable | v0 (Vercel), Bolt (StackBlitz), Replit Agent | Lovable best for full-stack Next.js; v0 best for individual components; Bolt fast for prototyping; Replit best for Python stacks |
| 4 | IDE | Cursor | Windsurf (Codeium), VS Code + Copilot | Cursor for general; Windsurf strong on agent-driven; VS Code default if team prefers |
| 1 | Hosting | Vercel | Railway, Render, Cloudflare Pages | Vercel best for Next.js + AI SDK; Railway for full-stack including DB; Cloudflare for edge-first |
| 1 | Auth | Clerk | Supabase Auth, Auth.js | Clerk best DX; Supabase Auth bundled with DB; Auth.js for full control |
| 1 | Analytics | PostHog | Plausible, Mixpanel | PostHog for product analytics + session replay; Plausible for simple traffic |
| 1 | Payments | Stripe | Paddle, Lemon Squeezy | Stripe universal; Paddle/Lemon for merchant-of-record (simpler global tax); for KL: Billplz, iPay88, eGHL; for AU: Stripe direct or Square |
| 1 | Error monitoring | Sentry | LogTail, Highlight | Sentry standard; free tier sufficient for student scale |
| 2 | Database | Supabase | Neon, Convex, Firebase | Supabase = Postgres + Auth + Storage + pgvector + Realtime + Edge Functions; comprehensive default |
| 2 | Vector store | pgvector (in Supabase) | Turbopuffer, Pinecone, Weaviate | pgvector simplest; Turbopuffer cheapest at scale; Pinecone if you need managed-without-Postgres |
| 2 | Object storage | Supabase Storage | Cloudflare R2, AWS S3 | Bundled with Supabase; R2 cheaper at scale (no egress fees) |
| 3 | Foundation model | Anthropic Claude | OpenAI GPT, Google Gemini, DeepSeek | See §21.4.5 for model-by-model guidance |
| 3 | Inference (open-weight) | Together AI | Fireworks, Groq, Modal, Replicate | Together broadest model selection; Groq fastest inference; Modal best Python integration |
| 3 | AI SDK / orchestration | Vercel AI SDK | LangGraph, Mastra, Pydantic AI | Vercel AI SDK for TypeScript/JS; LangGraph for complex agents; Pydantic AI for type-safe Python |
| 3 | LLM observability | LangSmith | Langfuse (self-hostable), Helicone | LangSmith mature; Langfuse open-source alternative; Helicone simple proxy approach |
| 3 | RAG framework | LangChain (JS or Python) | LlamaIndex, custom | LangChain dominant; LlamaIndex strong on document-heavy use cases |
| 3 | Evaluation | LangSmith / OpenAI Evals | Promptfoo, Inspect AI | LangSmith for production loops; Promptfoo for CI/CD; Inspect AI for safety-focused evals |
| 3 | Agent protocol | MCP (Model Context Protocol) | Custom HTTP, OpenAI tools | MCP de-facto standard since Nov 2024; supported by Anthropic, OpenAI, Vercel AI SDK |
| — | Source control | GitHub | GitLab, Bitbucket | GitHub default for student teams |
| — | Issue tracking | Linear | GitHub Issues, Notion | Linear best DX; GitHub Issues sufficient for free; Notion for non-technical mixed teams |
| — | Docs / wiki | Notion | GitBook, Mintlify | Notion default; Mintlify for public-facing developer docs |
| — | Sync comms | Slack | Discord, Teams | Slack standard for B2B context; Discord for community-facing |
For KL/Melbourne teams, three additional considerations:
- Vercel has US, EU, and Asia-Pacific edge regions; latency from Klang Valley to Vercel’s Singapore region is ~10–30ms. Latency from Melbourne to Vercel’s Sydney region is ~5–15ms.
- Anthropic has US-based inference by default; Asia-Pacific regional endpoints are available on enterprise plans (not relevant for student scale, but plan for it post-graduation if you continue).
- Supabase allows region selection; choose
ap-southeast-1(Singapore) for KL orap-southeast-2(Sydney) for Melbourne. Cross-region writes add 100–250ms; for cross-campus teams, picking one region and accepting the cross-campus latency is the standard pattern.
22.4.2 The build / buy / borrow decision tree (refined from Chapter 21)
For each Must-have feature from your MoSCoW, walk through:
Question 1: Is this feature your competitive wedge?
YES → Build (and skip remaining questions).
NO → continue.
Question 2: Does a vendor product cover this feature at <USD 50/month
for student-MVP scale?
YES → continue to Question 3.
NO → continue to Question 4 (open-source).
Question 3: Is the vendor's lock-in cost (data extraction, integration
rebuilding) acceptable if we want to migrate in 12 months?
YES → Buy.
NO → continue to Question 4.
Question 4: Is there a mature open-source library or self-hostable
alternative?
YES → Borrow (use the open-source).
NO → Build (with explicit acknowledgement that you are building
infrastructure, not the wedge).
A sanity check on your decision: tally up the Build features. If you have 6+ Build features, recur back to Chapter 21’s MoSCoW and demote some to Should/Could. Six Build features is the upper limit for a 5-person 4-week sprint with Week 6 alpha as the deadline.
22.4.3 Repository structure template
For a typical Next.js-based AI MVP, the repo structure:
project-name/
├── README.md # The first document any new team member reads
├── .env.example # All required env vars listed (no secrets)
├── .gitignore # Standard Next.js + node_modules + .env.local
├── package.json
├── tsconfig.json
├── next.config.js
├── tailwind.config.ts
│
├── app/ # Next.js App Router
│ ├── layout.tsx
│ ├── page.tsx
│ ├── (auth)/ # Public auth routes
│ ├── (dashboard)/ # Authenticated routes
│ └── api/
│ ├── chat/route.ts # AI inference endpoints
│ └── webhooks/route.ts # Clerk, Stripe, etc. webhooks
│
├── components/ # React components (UI primitives)
│ └── ui/ # shadcn/ui components
│
├── lib/ # Application logic
│ ├── ai/ # Foundation-model client wrapper
│ ├── db/ # Supabase client + schema types
│ ├── auth/ # Clerk helpers
│ └── utils.ts # Shared utilities
│
├── prompts/ # Prompt templates as .md files (version-controlled)
│ ├── system-tutor.md
│ └── teacher-review.md
│
├── supabase/ # Database migrations
│ ├── migrations/
│ └── seed.sql
│
├── docs/ # Internal team documentation
│ ├── architecture.md # Architecture diagram + reasoning
│ ├── conventions.md # Coding conventions
│ ├── deployment.md # Deployment runbook
│ └── decisions/ # Architecture decision records (ADRs)
│
└── tests/ # Test files (added Week 5+)
The structure is a Next.js convention with one addition: prompts/ as a first-class directory. Putting prompts in version control (rather than as string literals in code) makes them reviewable, diffable, and stable across team members.
22.4.4 Foundation-model client wrapper pattern
A thin abstraction over the foundation-model API that lets the rest of the app swap models without rewriting:
// lib/ai/client.ts
import { generateText, generateObject } from 'ai';
import { anthropic } from '@ai-sdk/anthropic';
import { openai } from '@ai-sdk/openai';
const MODEL_REGISTRY = {
'claude-sonnet': anthropic('claude-sonnet-4-6'),
'claude-opus': anthropic('claude-opus-4-7'),
'gpt-5': openai('gpt-5'),
'gpt-mini': openai('gpt-5-mini'),
};
type ModelKey = keyof typeof MODEL_REGISTRY;
export async function generateExplanation(
promptInput: { question: string; level: string; language: 'BM' | 'EN' },
modelKey: ModelKey = 'claude-sonnet'
) {
const systemPrompt = await loadSystemPrompt('tutor-spm');
const { text, usage } = await generateText({
model: MODEL_REGISTRY[modelKey],
system: systemPrompt,
prompt: buildUserPrompt(promptInput),
maxTokens: 1500,
});
await persistInference({
modelKey, promptInput, output: text, usage,
});
return { text, usage };
}Three properties of the pattern matter:
- Model selection is a function argument, not hardcoded. Switch from Claude Sonnet to GPT-5 for an A/B test by passing a different
modelKey. - System prompt is loaded from a file, not inline. Prompts are reviewed via PR like any other code.
- Every inference is persisted to the database, with input, output, model, and usage. This is your evaluation infrastructure under construction.
22.4.5 Database schema template for typical AI MVP
A starter schema for a multi-tenant AI MVP with users, organisations, and AI inferences. Adapt to the specific domain:
-- Organisations (tenants — for Team Aroma, "centres")
create table organisations (
id uuid primary key default gen_random_uuid(),
name text not null,
created_at timestamptz default now(),
metadata jsonb default '{}'::jsonb
);
-- Users (clerk-managed; we mirror identity into our DB)
create table users (
id uuid primary key, -- matches Clerk user_id
email text unique not null,
organisation_id uuid references organisations(id),
role text check (role in ('owner', 'teacher', 'student')),
created_at timestamptz default now(),
metadata jsonb default '{}'::jsonb
);
-- AI inferences (the moat-building log)
create table ai_inferences (
id uuid primary key default gen_random_uuid(),
user_id uuid references users(id),
organisation_id uuid references organisations(id),
feature_key text not null, -- e.g. 'tutor-explanation'
model_key text not null, -- e.g. 'claude-sonnet'
prompt_version text not null, -- e.g. 'tutor-spm-v3'
input_data jsonb not null,
output_text text,
output_data jsonb,
input_tokens int,
output_tokens int,
cost_usd numeric,
latency_ms int,
created_at timestamptz default now()
);
-- Domain tables follow (questions, attempts, reviews, etc.)
-- Row-level security: every table has policies like:
alter table ai_inferences enable row level security;
create policy "Users can read inferences from their organisation"
on ai_inferences for select
using (organisation_id = (
select organisation_id from users where id = auth.uid()
));
create policy "Service role can write inferences"
on ai_inferences for insert
with check (true); -- Server-side writes only; client cannot insert directlyThe schema is opinionated:
- Organisations are first-class. Even if your MVP looks single-user, the multi-tenancy primitive is built in from the start; retrofitting later is painful.
- Users mirror Clerk. Clerk owns identity; your DB owns the application data, with
user.idbeing the Clerk-assigned UUID. - Every AI inference is logged. This single table will be the input to your Week-5 evaluation work and your Week-7 data flywheel.
- RLS is enabled on every table by default. Any forgetting of RLS becomes a security review item.
22.4.6 The vertical-slice task breakdown template
For the Week-4 vertical slice, decompose into 6–10 tasks, ordered for parallel execution:
VERTICAL SLICE: [feature name]
Slice scope: one [user] does one [action] end-to-end.
Task 1: [Auth path] — user signs in with role X
Owner: [name]
Estimated time: [hours]
Dependencies: env setup complete
Task 2: [Database] — create the necessary tables and RLS policies
Owner: [name]
Estimated time: [hours]
Dependencies: Supabase project provisioned
Task 3: [API endpoint] — handle the user action
Owner: [name]
Estimated time: [hours]
Dependencies: tasks 1, 2
Task 4: [Model integration] — call foundation model with appropriate prompt
Owner: [name]
Estimated time: [hours]
Dependencies: task 3
Task 5: [Persistence] — save inference and result
Owner: [name]
Estimated time: [hours]
Dependencies: tasks 2, 4
Task 6: [UI] — the screens for this user flow
Owner: [name]
Estimated time: [hours]
Dependencies: task 1
Task 7: [Integration] — wire UI to API to model to DB
Owner: [name]
Estimated time: [hours]
Dependencies: tasks 3, 4, 5, 6
Task 8: [Acceptance test] — manual end-to-end test
Owner: all team members
Estimated time: 30 min
Dependencies: task 7
The pattern: parallelise where possible (UI on the front end, DB schema on the back), serialise where dependencies require (integration must wait for components). For a 5-person team the slice should complete in 3–4 working days with parallel effort.
22.4.7 Cross-campus async workflow guide
Specific patterns that make KL-Melbourne async work well:
Daily standup as written async update (not synchronous video call). Each team member posts 3 lines to the team Slack/Discord channel by 11am their local time: what I did yesterday; what I’m doing today; what I’m blocked on. The format is rigid; the discipline is what makes it useful. Reading the standups takes 5 minutes; the visibility prevents drift.
Twice-weekly synchronous meetings. Tuesday at 9am KL / 12pm Melbourne (planning meeting); Friday at 4pm KL / 7pm Melbourne (demo + retrospective). Both within the 5-hour overlap window. Alternate which campus has the more inconvenient hour for non-meeting work.
Async-first decision making. Decisions are made in writing, in the team docs (Notion / Linear). A team member proposes; others have 24 hours to comment; if no objection, the decision is final. Synchronous meetings are for clarification, not decision-making.
Code review with timezone-aware turnaround. A PR opened in KL morning (and not blocking) gets reviewed in Melbourne morning (KL afternoon). A PR opened in Melbourne afternoon (and not blocking) gets reviewed in KL morning the next day. The pattern lets PRs flow continuously without anyone waiting.
Blocking-request escalation channel. A separate Slack/Discord channel for “I am blocked and need response within 4 hours.” The channel has aggressive notifications turned on for all members; non-blocking traffic stays in the main channel.
Production pager (or its absence). No-one is on call after their working hours. If production breaks at 11pm KL, it stays broken until 9am Melbourne. The discipline is healthy and prevents burnout; for a student MVP, the on-call cost is not justifiable.
22.5 Worked example — Team Aroma’s Week 4
Team Aroma starts the Pulse build on Monday with the MVP scoping document from Week 3 as their contract. The team’s working agreement: Aliyah (CEO; full-time) is the Week-4 infrastructure lead and main backend engineer; Wei Hao (CTO; full-time) is the model-integration and database lead; Sara (Head of Design; ~25 hours/week) is the UI/UX lead; Daniel (Melbourne; Head of Curriculum; ~20 hours/week) owns the prompt content and SPM-rubric verification; Priya (Melbourne; Head of Content; ~20 hours/week) handles documentation and curriculum content sourcing.
Day 1 (Monday): provisioning
Aliyah spends the morning provisioning team accounts. By 1pm KL she has:
- GitHub organisation
team-aroma-pulsewith a private repo, branch protection onmain, and all 5 members as members. - Vercel team account with the Hobby plan.
- Supabase project in
ap-southeast-1(Singapore) — closer-latency for both KL and Melbourne thanap-southeast-2. - Clerk application in development mode.
- Anthropic API key with USD 50 credit (paid tier) — they discover later this barely covers Week 4 due to the prompt-iteration calls.
- PostHog free tier.
- Notion workspace, Linear project, Slack workspace.
By 2pm Melbourne (5pm KL) Daniel and Priya have access to all of the above and have run git clone successfully. Wei Hao has installed the Vercel CLI and the Supabase CLI on his local machine and confirmed he can deploy. Sara has Cursor installed and configured.
The Day-1 stumble: Daniel tries to use his personal Anthropic account credentials and is rate-limited within 30 minutes. The team forgot to give him the team credit-card-funded shared key. The fix is 5 minutes once identified, but the lost time is 2 hours.
Day 1 evening: Lovable scaffolding
Wei Hao opens Lovable and prompts:
“Build a Next.js app for a Malaysian SPM tutoring platform. Three user roles: centre owner, teacher, and student. Centre owners manage their teachers and students. Teachers see assigned students, generate AI explanations for SPM Add Maths questions, review and edit explanations before sending to students. Students receive practice questions with explanations and submit attempts. Use Tailwind, shadcn/ui, and Clerk for auth.”
Lovable produces a deployable Next.js 14 scaffold in 5 minutes. Wei Hao iterates the prompt 4 times to refine the dashboard layouts, then exports to GitHub. The auto-deploy to Vercel completes; the preview URL works. Wei Hao posts the URL to Slack at 9pm KL; Daniel and Priya can both load it from Melbourne.
Day 2 (Tuesday): auth and DB foundation
Wei Hao wires Clerk: @clerk/nextjs installed, sign-in and sign-up routes configured, the protected dashboard wrapper applied. By Tuesday lunchtime, sign-up works end-to-end. The Tuesday morning sync meeting (9am KL / 12pm Melbourne) is 25 minutes: confirm provisioning is complete, confirm Lovable scaffold accepted, plan Wednesday’s database work.
Wei Hao spends Tuesday afternoon on the Supabase schema. He drafts the schema on a Miro board with Daniel (over screen-share). Tables: centres, users, students_at_centre, questions, student_attempts, ai_explanations, teacher_reviews. RLS policies designed: a centre’s data is visible only to that centre’s users; cross-centre queries are blocked at the database layer.
The schema goes through a 30-minute review with the whole team (Tuesday 4pm KL / 7pm Melbourne, the second sync meeting of the week, scheduled here as an exception). Two changes from the review: Daniel suggests adding topic_code to questions for SPM-curriculum alignment; Sara suggests a display_state field for soft-deletes in the UI. Schema commit at 8pm KL; migration applied to Supabase.
Day 3 (Wednesday): foundation-model integration
Wei Hao integrates the Anthropic SDK via the Vercel AI SDK. The first prompt template (prompts/tutor-explanation-v1.md) is drafted by Daniel and Priya — Daniel writes the SPM-rubric requirements, Priya writes the bilingual BM/English instructions. The prompt is reviewed by Aliyah for tutoring-quality realism.
The first round-trip test: Wei Hao runs a sample SPM Add Maths question through the model. The output is plausible but the BM section uses formal language unsuited for Form-5 students. Daniel iterates the prompt to add casual-but-accurate Malaysian vocabulary; the second round produces output Aliyah judges acceptable.
By Wednesday evening, the model client is wired into the API route, every inference is being logged to Supabase, and the cost of a typical explanation is ~USD 0.04. PostHog is also wired; every page view and API call is tracked.
The Wednesday stumble: at ~10pm KL, Wei Hao hits the Anthropic rate limit (3 requests per minute on the introductory paid tier). He spends an hour debugging before realising the issue and upgrading the tier (USD 100 monthly cap). Loss: ~1 hour of debugging time. The lesson is documented in the team’s docs/decisions/001-anthropic-tier.md.
Day 4 (Thursday): the first vertical slice
Sara has worked through Wednesday on the UI screens (teacher dashboard, student question view, teacher review interface). By Thursday morning the screens are visually correct but not wired to real data. The team’s Thursday goal: complete the vertical slice — one teacher logs in, sees one assigned student, generates one practice question with AI explanation, reviews it, sends to the student, and the student sees it.
The team works in parallel:
- Aliyah implements the API endpoint that, given a teacher’s request, generates an AI explanation and persists it to
ai_explanations. - Wei Hao adapts the data-layer queries to feed the dashboard with the right student-and-question lists.
- Sara wires the UI components to the API via React Query.
- Daniel seeds the database with 5 SPM Add Maths questions (taken from public 2024 SPM trial papers).
- Priya seeds 1 centre, 1 teacher, and 2 students with realistic-looking data.
By Thursday 6pm KL, Aliyah opens a fresh Chrome window. She signs up as a teacher, gets assigned to the seeded centre via a manual database update, sees the seeded students, clicks “Generate explanation” on a question, sees the AI output appear after ~3 seconds, edits one phrase, clicks “Send to student,” opens an incognito window, signs up as a student, sees the explanation. The vertical slice works.
The Thursday-evening team chat captures the moment: “Slice ✅” from Aliyah; celebratory GIFs from Sara and Daniel.
Day 5 (Friday): demo, gap analysis, Week-5 planning
The Friday demo (4pm KL / 7pm Melbourne) is 45 minutes. Aliyah drives:
- Demonstrates teacher sign-up, login, dashboard.
- Generates explanation for one question; reviews and edits; sends to student.
- Switches to student view; sees the explanation; submits an attempt.
- Switches back to teacher view; sees the attempt and the (very basic) auto-generated feedback.
The team identifies gaps:
- No invite flow. Currently students and teachers sign up separately and Aliyah manually links them via the DB. Centre-owner-mediated invites are needed by Week-5 end.
- No SPM-format checking. The AI generates explanations but the rubric-format validation layer (a Must-have from MoSCoW) is not implemented.
- No batch question generation. The AI generates one explanation at a time; teachers will want to assign 5–10 questions at once.
- No mobile-responsive testing. The team has tested only on desktop; many students will use mobile.
- No production hardening. Error handling, retries, edge cases. Acceptable for the slice; not acceptable for alpha.
- No analytics dashboard for the team. PostHog is collecting data but no view aggregates it. A team-internal dashboard is needed for Week-5 evaluation work.
The Week-5 plan converts the gaps to specific tasks; Linear tickets are created for each. The team also notes that 1 of the team’s 50 USD Anthropic credit has been spent in Week 4 alone (~USD 30), implying ~USD 60 will be needed for Weeks 5–7. The team commits a USD 100 paid tier from Aliyah’s personal credit card, with the team agreeing to reimburse from any pilot revenue.
The Friday submission goes in at 11pm KL: the repo URL, the Vercel preview URL, the architecture diagram (Sara drew it in Excalidraw), the stack decision document, the build velocity tracker (showing 30 dev-days of effort across 5 members in the week, slightly under the planned 35), the cost log, and the updated risk register.
What Team Aroma got right and what they almost got wrong
Three things they did well: (1) provisioning was completed on Day 1 despite the personal-Anthropic-key stumble, which prevented Week-long blocking; (2) the schema review on Tuesday afternoon caught two missing fields (topic_code, display_state) before they became expensive to add; (3) the vertical-slice discipline meant Thursday’s celebration was on a working end-to-end demo, not on disconnected component completion.
Three things they almost got wrong: Wei Hao initially scoped Wednesday for “build the full prompt library” (which is a Week-5 task); the team almost agreed to build the SPM-format-checking layer in Week 4 (which would have made the slice impossible to complete on time); and Sara almost spent Thursday on responsive-mobile work (which she demoted after the team agreed alpha would be desktop-first). Each of these would have produced a Frankendemo by Friday — components in various states of partial completion, none integrated. The team’s MoSCoW discipline from Week 3 — keeping the Must list to 8 features — saved the integration work.
22.6 Course exercises and Week 4 deliverable
Submit the Week 4 deliverable bundle as a shared folder by Friday 23:59. Required artefacts:
22.6.1 Required artefacts
- Repository URL with a complete README documenting setup, environment variables, and deployment.
- Stack decision document — one to two pages listing the chosen tools at each layer with rationale (per §22.4.1 and §22.4.2).
- Architecture diagram — one page (Excalidraw, draw.io, or whiteboard photo is fine) showing the major components and their connections.
- Live working prototype URL — Vercel preview URL with the vertical slice deployed.
- Vertical slice demo — a 2–4 minute screen recording showing the slice end-to-end (uploaded as a Loom or YouTube unlisted link).
- Build velocity tracker — a simple sheet showing dev-days planned vs spent per task, with explanations for variances >25%.
- Cost log — actual spending across all tools (Vercel, Supabase, Clerk, Anthropic, PostHog) for the week.
- Updated risk register from Chapter 21 — risks closed, risks newly identified, mitigations updated.
22.6.2 Grading rubric (50 points)
| Component | Points | Distinction-level criteria |
|---|---|---|
| Stack decision rationality | 10 | Tool choices are defensible against §22.4.2 decision tree; build vs buy/borrow ratio supports the wedge |
| Vertical slice depth | 15 | One feature works end-to-end through every layer; demo shows real persistence and real model output |
| Documentation discipline | 5 | README is sufficient for a new team member to set up; ADRs documented; conventions clear |
| Build velocity | 5 | Dev-days spent within ±25% of planned; variances explained |
| Cross-campus coordination | 5 | Twice-weekly sync meetings held; async standup discipline maintained; PRs reviewed across campuses within 24 hours |
| Cost discipline | 5 | Spending tracked; tier upgrades anticipated; paid-tier provisioned before alpha |
| Risk management | 5 | Risks updated weekly; new risks (production hardening, rate limits, cost cliffs) identified |
Pass: 30. Credit: 36. Distinction: 42. High Distinction: 47.
The team-comprehension penalty from §19.6.2 applies; additionally, every team member must be able to demonstrate their own commits and explain their own code in any random spot-check.
22.6.3 Things to do before Monday of Week 5
By Sunday evening of Week 4, in addition to the deliverable submission:
- Identify the 100 representative inputs that will form your Week-5 evaluation set (the “golden set” in Chapter 23). For Team Aroma, this is 100 SPM Add Maths questions across the major topic areas.
- Confirm the alpha-cohort booking from Week 2: the friendly users committed to Week-6 alpha testing should be re-confirmed via personal message before Sunday. Five-week-old commitments evaporate without re-confirmation.
- Read Chapter 4 (Building the AI-native enterprise) and §23.1–§23.3 of Chapter 23 (Evaluation, golden sets, and build-measure-learn) before Monday of Week 5. The evaluation framework you build in Week 5 is the discipline that distinguishes a working MVP from a defensible MVP.
References for this chapter
Lean and agile development practice
- Beck, K. and others (2001). Manifesto for Agile Software Development. agilemanifesto.org.
- Humble, J. and Farley, D. (2010). Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation. Addison-Wesley.
- Forsgren, N., Humble, J., and Kim, G. (2018). Accelerate: The Science of Lean Software and DevOps. IT Revolution Press.
Modern web and AI infrastructure
- Vercel (2024–2026). Vercel Platform Documentation and Vercel AI SDK Documentation. vercel.com.
- Anthropic (2024–2026). Claude API Documentation and Model Context Protocol Specification. docs.anthropic.com, modelcontextprotocol.io.
- Supabase (2024–2026). Supabase Documentation. supabase.com.
- Clerk (2024–2026). Clerk Documentation. clerk.com.
Foundation-model providers and inference services
- OpenAI (2024–2026). OpenAI Platform Documentation. platform.openai.com.
- Google (2024–2026). Gemini API Documentation. ai.google.dev.
- DeepSeek-AI (2024). DeepSeek-V3 Technical Report. arXiv:2412.19437.
- DeepSeek-AI (2025). DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. arXiv:2501.12948.
- Together AI, Fireworks AI, Groq, Modal Labs, Replicate (2024–2026). Provider documentation.
Cases referenced in §22.3
- Iansiti, M. and Lakhani, K. R. (2020). Competing in the Age of AI: Strategy and Leadership When Algorithms and Networks Run the World. Harvard Business Review Press. (JPMorgan COiN, Stitch Fix, DBS GANDALF, Watson Health.)
- Lamarre, E., Smaje, K., and Zemmel, R. (2023). Rewired: The McKinsey Guide to Outcompeting in the Age of Digital and AI. Wiley. (DBS standardised tooling.)
- Anysphere (Cursor) public technical and engineering blog posts (2023–2026).
- Anthropic (2024). Model Context Protocol — Introduction and Specification. November 2024.
Stack-selection literature
- Spolsky, J. (2001). The Joel Test: 12 Steps to Better Code. Joel on Software.
- Atwood, J. (2007). Programming Atomic Habits. Coding Horror Blog.
- Latent Space (Wang, S., 2023–2026). Newsletter and podcast on AI engineering practice.
- AI Engineer Summit (2023–2026). Conference proceedings. ai.engineer.
Further reading
For Next.js specifically, the official Vercel documentation is comprehensive; the Theo Browne YouTube channel covers practical patterns; the Lee Robinson (Vercel VP) blog covers idioms. For Python integration patterns, the FastAPI documentation, the Modal Labs blog, and the Pydantic AI documentation are the standard references. For MCP specifically, the Anthropic MCP specification and the open-source server library are the primary references.
For team-workflow practices in distributed software engineering, the GitLab Handbook (gitlab.com/company/culture/all-remote/) is the most-developed public document; many of its patterns transfer directly to student teams. For asynchronous-first communication, Doist’s The Async Manifesto and Automattic’s Distributed (Mullenweg) are concise references.
For the open-weight model ecosystem and self-hosted inference, the Hugging Face documentation, the LMStudio user guide, and the Ollama documentation cover the practitioner-side. For commercial open-weight inference providers (Together AI, Fireworks, Groq, Modal), each provides their own documentation; Latent Space and the AI Engineer YouTube channel cover comparative reviews.
Read Chapter 4 (Building the AI-native enterprise) and §23.1–§23.3 of Chapter 23 (Evaluation, golden sets, and build-measure-learn) before Monday of Week 5.