Chapter 25 — Week 7: Beta and the data flywheel
Welcome to Week 7. The alpha report is signed off and the alpha-to-beta gate cleared. This week the cohort triples or quadruples in size, the support model shifts from hand-held to self-serve, and the production system meets a distribution of users it has never seen before. By Friday you will have 15–25 active beta users, a hardened production deployment, an instrumented data flywheel that captures every input-output-outcome triple as the seed of your future advantage, and an evidence-based answer to whether beta should proceed to general-availability framing in Week 8. The single most-common Week-7 mistake is to treat beta as “alpha but bigger.” It is not. Beta is the first stress test of self-serve — does the system work without the team in the room? — and the first opportunity to turn the data flywheel from theoretical (Chapter 3) to operational. This chapter is about the discipline that distinguishes a beta from a public launch you happen to call “beta.”
Chapter overview
This chapter follows the same six-part structure. §25.1 (Concept) sets out what changes between alpha and beta, the data-flywheel theory from Iansiti-Lakhani made operational for AI MVPs in 2026, cohort recruiting at scale, self-serve onboarding architecture, production hardening requirements, A/B testing fundamentals at small scale, customer-success metrics, and the beta-to-GA gate. §25.2 (Method) is the day-by-day Week 7 sprint: Sunday cohort confirmation, Monday production hardening, Monday-Tuesday scaled onboarding, Tuesday-Wednesday data-flywheel instrumentation, Wednesday pricing-conversation introduction, Thursday metrics review, Friday beta report and Week 8 plan. §25.3 (Lessons from the cases) pulls eight specific data-flywheel and beta-launch lessons from Parts I–III. §25.4 (Tools and templates) gives you the beta onboarding email sequence, self-serve onboarding checklist, customer-health-score calculator, A/B testing framework, production-hardening checklist, pricing-conversation script, and beta report template. §25.5 (Worked example) continues Team Aroma through their Week 7: 18 confirmed beta users (15 new plus 3 alpha carryovers); the teacher-assignment-workflow pivot from the alpha report; one beta centre’s mid-week NPS-4 crisis and recovery; the data flywheel producing 240 inferences with 80 teacher-outcome labels in seven days; the pricing conversation that gets two centres to commit to RM 30/student/month at GA. §25.6 (Course exercises and deliverables) specifies the Week 7 submission with grading rubric.
How to read this chapter. Read §25.1 in full at Sunday-evening Week 6 or Monday-morning Week 7. Read §25.2 with the team and assign a member as the beta lead for the week (typically the team member with the strongest customer-facing skills). Treat §25.3 as Wednesday-evening reading once the data-flywheel instrumentation is producing results. Use §25.4 throughout. Read §25.5 before drafting your beta report on Friday. Submit against §25.6 by Friday 23:59.
25.1 Concept
25.1.1 Beta vs alpha — what changes
Beta is not “alpha but bigger.” Five properties change qualitatively, and a team that does not adjust its operating model produces a beta that fails at the cohort transitions rather than at the technology.
Cohort size, 3–5× larger. From 5–10 alpha users to 15–25 beta users for a student-team build. The increment matters because it crosses the threshold where the team can no longer track each user individually. With 5 alpha users, every user’s experience is in the team’s working memory; with 20 beta users, the team must rely on instrumented metrics to know what is happening.
Support intensity, reduced. Alpha runs at ~30 minutes per user per day of team support (onboarding, observation, mid-week check-ins). Beta runs at ~10 minutes per user per day on average — and most of that time goes to the tail of users having problems, not to the main population using the product successfully. The shift requires self-serve infrastructure: onboarding flows that work without a team member present, in-product help, error messages that explain what went wrong, and an asynchronous-first feedback channel.
Time horizon, longer. Alpha is one week. Beta is two weeks at minimum (Weeks 7–8 in this course’s structure). The longer horizon matters because some product properties only become visible over time: retention decay, novelty wear-off, sustained engagement vs initial enthusiasm, the late-week-2 “now what?” question that the alpha never reaches.
Pricing introduction (often). Alpha is free by definition. Beta typically introduces pricing — either as actual paid usage or as a commitment to pay at general availability. The pricing conversation is itself a research instrument; the price points users will and will not commit to is information the alpha could not produce.
Production reliability bar. Alpha tolerates some downtime, some bugs, some slowness — alpha users are pre-committed and patient. Beta does not. Even friendly beta users disengage quickly when they encounter friction; the bar for “good enough” rises. Production hardening (§25.1.5) is the work that makes the beta system usable without the team in the room.
The transition is the first time the team experiences the gap between the product they have built and the product the market actually engages with. The gap is often disorienting; some teams discover at beta that their alpha learnings did not generalise. This is normal; it is also why beta exists.
25.1.2 The data flywheel — operational, not theoretical
Chapter 3 (§3.11, after Iansiti and Lakhani, 2020) introduced the data flywheel as the durable competitive advantage of AI-native firms. The theoretical version: data flows from users into the model, the model produces better outputs, better outputs drive product engagement, more engagement produces more data. Each turn of the flywheel compounds; competitors without comparable data accumulation cannot match the system’s performance.
Beta is the first opportunity to make the flywheel operational. The operational version has four properties:
1. Every input-output pair is logged. From Chapter 22’s ai_inferences table: every foundation-model call persists the input, the output, the model used, the prompt version, the user, the organisation, the timestamp, and the cost. By end of Week 7 the team should have 200+ logged inferences from real users.
2. Every user action on the output is captured as an outcome label. A teacher’s accept / edit / reject decision; a student’s “this helped” thumbs-up; a centre owner’s “use this in production” toggle. The outcome label is what transforms an unlabelled inference into supervised learning data. Without outcome labels, the flywheel has no signal to compound on.
3. The labelled corpus feeds back into evaluation and improvement. The Week-7 corpus is a production-distribution evaluation set — more representative than the Week-5 golden set, because it is generated by real users in real conditions. Week-7 prompt iteration uses the corpus as additional eval data; Week-8+ prompt iteration uses it as primary eval data. By Week 10, the team has roughly 1,000–3,000 labelled inferences — small for production-scale ML but substantial relative to the team’s Week-5 starting point.
4. The data is a byproduct of normal operation, not a separate collection effort. The Ant Group lesson from Chapter 3 generalises: a flywheel that requires expensive separate data collection is fragile (Watson Health’s pattern); a flywheel that turns automatically with normal product use is durable (Ant Group’s, Cursor’s, Stitch Fix’s pattern). The Week-7 instrumentation should require zero additional effort from users beyond what they would do naturally.
A specific note on the Week-7 instrumentation. Most student teams overbuild the flywheel mechanism: they design elaborate user-feedback systems, schedule periodic surveys, build dashboards for analysing the data. All of this is premature. At beta, the right instrumentation is minimal and automatic: every inference logged with whatever outcome the user’s normal action produces. The corpus that results may be sparse (some inferences have outcome labels, some do not), inconsistent (different users use the system differently), and skewed (the most-engaged users contribute most of the data) — but it is the data the team has, and it is far more useful than any planned collection programme would be in 7 days.
25.1.3 Cohort recruiting at scale
Beta cohort recruiting must complete in 3–4 days, not weeks. The constraint differs from Week 2’s customer-discovery cohort recruiting (which had 5 days for 20 interviews) because beta requires committed users, not interview subjects.
The recruiting funnel:
| Stage | Typical conversion | For a 20-user beta target |
|---|---|---|
| Initial contacts | — | 50 |
| Interest expressed | 60% | 30 |
| Onboarding scheduled | 75% | 22 |
| Onboarding completed | 85% | 19 |
| Active in beta week (3+ days) | 80% | 15 |
The team should plan for the funnel: aim for 50 initial contacts to land at 15 active users. Sources for the initial contacts:
- Alpha users carrying over. All alpha users from Week 6 should be invited to continue (3–6 users typically; some will decline due to alpha fatigue).
- Warm referrals from alpha users. Each alpha user is asked, at the end of the alpha week, “is there anyone else you’d recommend?” High-quality warm intros from satisfied alpha users typically convert at 50%+. Aim for 2–3 referrals per alpha user.
- Re-contact from the broader Week-2 cohort. Interviewees who did not become alpha users but expressed interest. Re-contact these with “we’re now in beta; would you like to try?”
- Cold outreach within the segment. If the carry-over plus referrals plus Week-2 re-contacts do not produce enough, cold outreach to additional segment members (LinkedIn, sector-specific channels). Typical conversion is 10–20% to even a brief response, which is why this is the last source.
For Team Aroma, the math: 3 alpha carryovers + 8 warm referrals from alpha users (most produced 2–3 each) + 4 Week-2 re-contacts + 6 LinkedIn cold outreaches = 21 onboarded, leading to 18 active in the beta week. Cohort distribution skewed toward existing alpha-cohort centres (which is appropriate at this stage; Week 8 broadens further).
25.1.4 Self-serve onboarding
The 30-minute video onboarding ritual from Chapter 24 does not scale to 20 users. By Week 7 the team needs a self-serve onboarding path: users sign up, complete an onboarding flow inside the product, and reach first-value without team intervention.
Self-serve onboarding has four components:
The pre-onboarding email sequence. Three emails, sent on days 1, 3, and 7 of the user’s account. Email 1 (immediate): welcome, what to do first, what feedback channel to use. Email 2 (day 3): how is it going? Specific questions. Email 3 (day 7): summary of usage patterns; specific suggestions for getting more value.
The first-time experience. When a user signs in for the first time, the product walks them through one core workflow. Industry term: the “aha moment” — the moment the user experiences the value the product offers. The team’s design objective is to engineer the aha moment to occur within the first 5 minutes of first use. For Team Aroma’s Pulse: the aha moment is when the teacher sees the AI-generated explanation for a question they would otherwise have explained themselves, recognises it as approximately what they would have written, and edits it to add their personal touch. The first-time experience is designed to lead the teacher to that moment.
In-product help and tooltips. Contextual help that appears when the user is in the area where they need it. For complex workflows, optional walkthrough videos. Resist the temptation to write extensive documentation pages users will not read; the in-product context delivers help where the user is.
The activation funnel. Track each user through stages: signed up → completed first task → completed second task → returned on day 2 → returned on day 7. The funnel reveals where users drop off; drop-off concentration is where onboarding is failing. The funnel becomes a primary metric in Week 7.
A specific note on activation rates. Week-7 activation rates (signup → first-task-complete → second-task-complete) for student-team MVPs typically run 40–70%. Below 40% suggests the onboarding flow is broken; above 70% suggests the cohort was over-curated and may not be representative. The target is 60%; investigate any user who signed up but did not activate.
25.1.5 Production hardening
Alpha tolerates some operational rough edges. Beta does not. The hardening checklist:
Security baseline.
- HTTPS everywhere (Vercel and Supabase provide this by default).
- Secure cookies, HttpOnly, Secure, SameSite attributes set correctly.
- No secrets in client-side code (verify by inspecting the bundle).
- Row-level security policies tested with adversarial inputs (a user from organisation A cannot read or write organisation B’s data).
- Input validation and parameterised queries (no SQL injection).
- Rate limiting on the foundation-model proxy (a single user cannot cause a runaway cost spike).
- CSRF protection on state-changing endpoints.
- Headers (CSP, X-Frame-Options, X-Content-Type-Options, Referrer-Policy) set.
Reliability baseline.
- Error boundaries on every major page (a single broken component does not break the page).
- Graceful degradation when foundation-model API is rate-limited (queue, retry, user-visible message).
- Database connection pooling configured (Supabase handles this on the platform).
- Automatic database backups (Supabase paid tier; for free tier, schedule periodic SQL dumps to S3 / Cloudflare R2).
- Health-check endpoint that uptime monitoring can poll.
- Uptime monitoring configured (UptimeRobot free tier or Vercel’s built-in monitoring).
Performance baseline.
- Page load time targets: <2s for landing, <3s for dashboard.
- Foundation-model latency: P95 < target stated in Chapter 21 quality bar.
- Database query latency: <100ms for typical queries.
- Frontend bundle size: <300KB compressed (Vercel reports this).
Operational baseline (within student-team realism).
- Documented deploy/rollback procedure.
- The on-call question: who responds to broken-production at 11pm? For most student teams the answer is “no one; we accept that downtime outside working hours is OK.” Document this explicitly so the team’s expectations align.
- Cost monitoring: alerts on anomalous spending (>5× baseline) configured on Anthropic / OpenAI / Vercel dashboards.
The hardening sprint. Day 1 of Week 7 (Monday) is dedicated to closing whatever items in the above checklist are open. Most teams have 5–10 items; Monday’s work moves them to PASS. Hardening is unglamorous but necessary; a beta that goes down at 6pm on Tuesday because of a routine bug is a beta that does not produce learning.
25.1.6 A/B testing in production, with discipline
Beta produces enough traffic to start running production A/B tests. The technique compares two variants of the system head-to-head on real traffic, with users randomised to one variant or the other, and outcome metrics compared.
Three considerations for student-team A/B testing:
Sample size. A/B testing’s sensitivity depends on sample size and effect size. The standard formula for the sample size needed to detect a difference of \(\delta\) in proportions with power \(1 - \beta = 0.8\) at significance \(\alpha = 0.05\) is approximately \(n \approx 16 / \delta^2\) per arm. For \(\delta = 0.10\) (10pp), \(n \approx 1{,}600\) per arm. Student-team beta cohorts produce 100–500 events per arm, which only supports detecting effect sizes of 20–40pp — much larger than typical real effects. The implication: at student scale, formal A/B testing for hypothesis testing is largely impossible.
The directional use of A/B testing. What student teams can do is use A/B testing directionally — running two variants, observing which performs better in the available traffic, accepting that the conclusion is suggestive rather than conclusive. This is appropriate for low-risk decisions (which copy works better, which UI variant feels right) but inappropriate for decisions with large quality or business consequences.
Bandit algorithms. For some kinds of decisions (which prompt variant to use across many invocations), a multi-armed bandit (Thompson sampling, Upper Confidence Bound) is more efficient than A/B. The bandit dynamically allocates more traffic to variants that appear to be performing better, balancing exploration and exploitation. The Vowpal Wabbit and PyMC libraries support production bandit deployment; for most student teams, this is over-engineered for Week 7 and should be deferred to post-graduation if relevant.
For Week 7 the realistic A/B work is: 1–2 variant comparisons (e.g., a prompt-iteration A/B; a UI-layout A/B), with directional conclusions noted but not over-interpreted. The classical-statistics rigor that Chapter 23 applied to golden-set evaluation does not transfer to small-sample production A/B; honest acknowledgement of the limit is the discipline.
25.1.7 Customer-success metrics
By Week 7 the team needs a small set of metrics that summarise how each user is doing. The standard set, adapted to AI MVPs:
Activation. Did the user complete the first core workflow? Binary or staged. Tracked as the activation funnel (§25.1.4).
Engagement. Active days per week. The denominator is usually 7 (calendar days) or 5 (working days, for B2B products).
Depth. Inferences or actions per active session. Measures whether engaged users go deep or skim.
Retention. Did the user return after the first week? After the second week? Week-2 retention is the strongest predictor of long-term retention.
Outcome. For AI products, the outcome metric is often the one the user cares most about — for Team Aroma’s Pulse: the proportion of AI explanations the teacher accepts unchanged, or the proportion of student practice attempts completed correctly.
Satisfaction. NPS or thumbs-up rate. Lagging indicator; useful but should not be the primary metric.
The team builds a “customer health score” combining these into a single number per user; users below threshold get personal attention. Below threshold for Team Aroma’s beta might be: <2 active days in week 1 or AI-acceptance rate <50% or explicit churn signal (cancellation, unsubscribe).
25.1.8 The beta-to-GA gate
By Friday of Week 7 the team must make a recommendation about whether to position Week 8 as general-availability framing — meaning the product is ready for users without the team’s personal involvement, and the work shifts from learning to scaling.
For student-team MVPs, “general availability” is a soft framing — there is no AppStore release, no “1.0 launch,” no marketing event. What it means in practice: the team’s work in Week 8 onward focuses on growth, pricing, and unit economics rather than on core product fixes. The Chapter 21 MVP is essentially “done” as a product in terms of what it does; the work that remains is taking it to more users and figuring out the business around it.
Gate criteria:
Quantitative:
- Activation rate ≥60% of beta users.
- Week-1 retention ≥70% (users active on day 7 after sign-up).
- Quality bars (from Chapter 21) still met against beta-traffic measurements.
- No ongoing Bug-blocking issues (anything caught is patched within 24 hours).
- Production uptime ≥99% during beta week.
- Customer-acquisition cost ≤ projected lifetime-value × 0.3 (rough rule of thumb for SaaS unit economics — see Chapter 26 for the full treatment).
Qualitative:
- The team can articulate, in writing, the segment for whom the product is now production-ready.
- The team can articulate the segment(s) for whom the product is not yet ready (and what would be needed).
- At least 5 beta users have explicitly committed to continuing usage at GA pricing.
- Independent third-party verification (e.g., the unit instructor’s spot-check, a mentor’s review) confirms the product is at GA-readiness.
If the gate is met, Week 8 work shifts to pricing, GTM, and unit economics (Chapter 26). If the gate is missed by 2+ criteria, Week 8 holds at beta-iteration mode and the GA framing is deferred. There is no penalty for deferring; the penalty is in shipping a product as “GA” when its production-readiness has not been validated.
25.2 Method — the Week 7 sprint
25.2.1 Sunday evening of Week 6: cohort confirmation
By Sunday evening of Week 6:
- Beta cohort target list confirmed: 20–25 names with onboarding-slot proposals for Monday/Tuesday.
- Self-serve onboarding flow tested by at least one team member acting as a new user.
- Pre-onboarding email sequence (the 3 emails) drafted and queued in the email tool (Loops, Resend, or similar).
- Pricing-conversation script drafted (§25.4.6).
- Production-hardening checklist (§25.1.5) at <5 open items.
25.2.2 Monday: production hardening final sprint
Monday is dedicated to closing the production-hardening checklist. The team works in parallel on whatever items remain open:
- Wei Hao closes the security items (RLS adversarial testing; rate-limiting; CSRF).
- Aliyah handles the operational items (uptime monitoring; cost alerts; deploy/rollback documentation).
- Sara works on the reliability items (error boundaries; graceful API-failure messages; loading states).
- Daniel and Priya finalise the onboarding email sequence and the in-product walkthrough copy.
By Monday 6pm KL, every item is PASS. The team runs a “chaos test”: one team member deliberately tries to break things (signs out and back in rapidly, submits invalid data, opens 20 tabs, simulates Anthropic API rate-limit failure by setting an intentionally low limit). The system either handles it gracefully or the bug is opened for immediate fix.
25.2.3 Monday-Tuesday: scaled onboarding
The onboarding model shifts. Instead of 30-minute video calls per user, the team:
- For the carry-over alpha users: a 5-minute “what’s new in beta” video sent by email; the alpha users self-serve onboard since they know the product.
- For new users from warm referrals: a 15-minute video call (half of alpha’s 30 minutes) to walk them through the new self-serve onboarding flow; the call is to confirm value capture, not to teach the product.
- For new users from cold outreach: pure self-serve through the in-product onboarding flow; the team monitors via session replay and reaches out if a user stalls.
By Tuesday end-of-day, all 18–20 onboardings should be initiated. Aim for 80% activation (completed first task) by Wednesday morning; users who have signed up but not activated get a personal nudge from the team member who recruited them.
25.2.4 Tuesday-Wednesday: data-flywheel infrastructure
The instrumentation work for the data flywheel:
- Outcome capture. Every teacher action on every AI explanation (accept / edit / reject; with edit-distance computed if edited) is captured to the database as an outcome label. The instrumentation is at the action level — no separate user survey, no separate annotation pipeline.
- Aggregation. A nightly job (or scheduled function on Vercel / Supabase Edge Functions) aggregates the day’s labelled inferences into a structured corpus. The aggregated data is queryable for evaluation, debugging, and prompt iteration.
- Active learning. Where the inference is high-uncertainty (low confidence from the model, or high disagreement across multiple model runs), it is flagged for prioritised human review. Active-learning-style sampling is more efficient than random sampling for surfacing the team’s most-informative learning opportunities.
- Privacy and consent. The beta user’s terms of service include an explicit grant of permission to use their interaction data to improve the system. The data is anonymised at the per-user level for any cross-cohort analysis.
By Wednesday end-of-day the flywheel infrastructure is producing daily structured corpus snapshots that can be inspected, queried, and used as input to Chapter 23’s evaluation pipeline.
25.2.5 Wednesday: pricing-conversation introduction
If the team is introducing pricing in beta (recommended for B2B products with established paying-customer behaviour; less critical for B2C), Wednesday is the right day. The pricing conversation is itself a research instrument; what users will and will not commit to is information.
The conversation framing (different team member per call):
"We're at the point where I want to tell you what comes next.
The alpha was free; the beta will continue to be free for the
rest of this period. After that, we're planning to move to a
monthly subscription. We're thinking [the price point] per
[unit]; this is what our analysis suggests is the right balance
between covering our costs and being affordable for centres
of your size.
I want to ask: would you commit, in principle, to continuing
at that pricing once we move to general availability? You don't
have to commit to a specific number of months; just whether
the pricing in the right zone for you."
The conversation has three possible outcomes per user:
- Yes, with specifics. The user explicitly commits to GA pricing, perhaps with a stated number of students or seats. This is the strongest signal; capture the exact words and the implicit deal in writing immediately.
- Yes, with conditions. The user commits subject to specific feature additions, integration work, or contract terms. This is also valuable; capture the conditions.
- No, or not yet. The user does not commit, with a stated reason. The reason is research data — pricing too high, feature gap too large, organisational decision-making slower than the timeline.
By end of Wednesday the team should have the per-user pricing conversation results in a structured document. Three to five “yes with specifics” or “yes with conditions” is the threshold for beta commercial validity at student scale.
25.2.6 Thursday: mid-week metrics review
The mid-week review (Thursday late afternoon, both campuses synchronous) reviews:
- Activation funnel. Where are users dropping off? Anyone stuck at a specific step?
- Customer-health-score distribution. How many users are in the green / yellow / red zones?
- The data-flywheel corpus snapshot. How many labelled inferences? What outcome distribution? Any failure-mode patterns visible in the corpus that the Week-5 golden set missed?
- Pricing-conversation results. How many “yes with specifics,” “yes with conditions,” “no”? What patterns?
- Operational health. Uptime; cost; error rate; foundation-model rate-limit incidents.
The review produces the Friday action list: which users to follow up with, which features to push to next week, which prompt iterations to run, which production-hardening items to revisit.
25.2.7 Friday: beta report and Week 8 plan
The beta report, structured similarly to the Week-6 alpha report:
BETA REPORT — [PROJECT]
Week 7, [start] – [end]
1. SUMMARY
- Beta cohort size and distribution
- Total inferences logged in beta week
- Total labelled outcomes
- Activation funnel
- Customer-health-score distribution
2. WHAT WORKED
- Top 3-5 themes from beta-user feedback and behaviour
3. WHAT DID NOT WORK
- Top 3-5 issues, with severity and team response
4. KEY LEARNINGS
- Architectural / product / business learnings
- Implications for Week 8
5. THE DATA FLYWHEEL
- Corpus state at end of week
- Outcome label distribution
- Quality of corpus for evaluation use
- Weeks-1-2 retention forecast based on activation patterns
6. PRICING SIGNAL
- Pricing conversation results summary
- Stated commitments at GA
- Conditional commitments
- Pricing-related concerns or objections
7. BETA-TO-GA GATE EVALUATION
- Quantitative criteria: met/missed with evidence
- Qualitative criteria: met/missed with evidence
- Recommendation: PROCEED TO GA / HOLD AT BETA / PIVOT
8. WEEK 8 PLAN
- Pricing structure to implement
- GTM approach
- Unit-economics targets
- Specific Week-8 deliverables
The Friday submission goes in by 23:59. The team-comprehension penalty applies; every member must be able to discuss the metrics dashboard, the pricing-conversation results, and the data-flywheel corpus.
25.3 Lessons from the cases
Eight specific lessons from Parts I–III shape Week 7 beta and data-flywheel work.
25.3.1 Stitch Fix — every interaction is data (Chapter 8, forthcoming)
Stitch Fix’s product was the recommendation; its moat was the data infrastructure. By 2018 the company had captured 60+ structured signals per shipment per customer, accumulated across millions of shipments. The recommendation algorithm was useful; the dataset was incomparable.
Operational implication. Your beta’s instrumentation is the foundation of your future advantage. Every user action that can be captured automatically should be captured automatically. Resist the temptation to ask users for explicit feedback when their implicit behaviour reveals the same information; explicit feedback is rare and biased, implicit behaviour is plentiful and unbiased.
25.3.2 Ant Group — proprietary transaction data as the durable moat (Chapters 3, 6)
Ant Group’s 3-1-0 lending model was structurally inseparable from Alipay’s transaction graph. The graph was generated by every Alipay transaction; the lending decisions were trained against it. A new entrant with the same model architecture but no graph data could not replicate the underwriting accuracy.
Operational implication. The data your beta produces is the seed of your eventual moat. For Team Aroma’s Pulse, the corpus of teacher-edited explanations is the asset that, by Week 12, distinguishes the system from a generic foundation-model wrapper. Every edit a teacher makes is a free annotation that competitors do not have. Treat the corpus accordingly: secure storage, careful curation, version control.
25.3.3 DBS GANDALF — production-grade reliability bar (Chapter 4)
DBS’s transformation included a production-reliability bar far above pilot or proof-of-concept standards: 99.99% uptime, sub-second response times, audit trails on every transaction, regulatory compliance baked into the architecture. The bar was set not because the bank was doing customer-discovery work but because the bank was operating real production systems at consumer scale.
Operational implication. Your beta is consumer-scale relative to your alpha. The reliability bar must rise: errors that were tolerable in alpha (auth bugs that affect 1 of 5 users) are not tolerable in beta (the same bug affects 4 of 20 users, with proportionally larger impact). The Monday production-hardening sprint is the discipline that closes this gap.
25.3.4 Cursor — every keystroke as signal (Chapter 5)
Anysphere instrumented Cursor’s IDE such that every keystroke, every accepted suggestion, every rejected completion, every cmd-K invocation produced telemetry. The team’s iteration speed was set by their visibility into actual user behaviour. By 2024 Cursor’s signal-collection density meant the team could detect a 1% change in completion-acceptance rate within hours; competitors with weaker telemetry could only detect 5–10% changes over weeks.
Operational implication. Your telemetry density determines your iteration speed. Better telemetry = faster learning = faster improvement = faster capture of the data flywheel. The investment in telemetry in Weeks 4–6 (Chapter 22’s observability; Chapter 23’s evaluation; Chapter 25’s flywheel instrumentation) compounds into an iteration-speed advantage your team carries into Week 8 and beyond.
25.3.5 Anthropic — safety evaluation at deployment scale (Chapter 13)
Anthropic’s deployment of Claude included continuous safety evaluation at deployment scale: every user interaction is sampled into evaluation corpora; the corpora are used to detect drift, surface new failure modes, and monitor model behaviour over time. The evaluation infrastructure is not a separate workstream; it is integrated into the operational architecture.
Operational implication. Your beta evaluation infrastructure is the foundation of your post-launch monitoring. The Week-5 golden set, Week-6 alpha-traffic eval, and Week-7 beta-traffic eval together produce the continuous evaluation infrastructure that you will run from Week 8 onward. This is permanent infrastructure, not throwaway work.
25.3.6 Slack — the “1,000 teams” milestone (Chapter 5)
Slack’s preview launch in August 2013 was followed by a controlled growth period. The company reached 1,000 active teams in the first month and used the cohort to validate workflow patterns before the public launch in February 2014. The 1,000-team milestone was the threshold at which the company transitioned from research mode to scaling mode.
Operational implication. For student-team MVPs, the equivalent threshold is 15–25 active beta users with solid retention. Below the threshold, the team is still in research mode (alpha-style learning); above it, the team can shift to scaling mode (Week 8’s pricing and GTM work). The threshold is not a magic number; it is the cohort size at which retention patterns become statistically interpretable and operational learnings become repeatable.
25.3.7 Stripe — onboarding scale-up via documentation and developer experience (Chapter 6, forthcoming)
Stripe’s transition from manual founder-led onboarding (described in Chapter 24) to scaled developer-self-serve was enabled by extraordinarily good documentation and API design. The team invested heavily in the developer experience because they understood that self-serve onboarding is enabled by product quality, not by support staffing.
Operational implication. Your self-serve onboarding flow is a product feature, not a support tool. The Week-7 work on onboarding emails, in-product walkthroughs, and the activation funnel is product engineering, with the same engineering discipline as core feature work. Investments here pay back through Weeks 8–10 and into the post-graduation year if the project continues.
25.3.8 ChatGPT — exponential beta (Chapter 13)
ChatGPT’s “beta” — explicitly labelled “research preview” — went from 0 to 1M users in 5 days, 0 to 100M in 2 months. The infrastructure scaled, the product held, and the data flywheel turned at unprecedented speed. The case is unrepresentative for student MVPs (you will not see 100M users) but it illustrates a structural property: a beta with strong product-market fit can grow non-linearly fast, and the team’s preparation for that scale matters.
Operational implication. Plan for the upside: if your beta produces strong word-of-mouth and your cohort size grows from 18 to 50 over the week (because beta users tell their colleagues), the infrastructure must hold and the team must triage. For most student teams this scenario does not occur; for the rare team that experiences product-market fit in beta, the preparation matters.
25.4 Tools and templates
25.4.1 Beta onboarding email sequence
EMAIL 1: WELCOME (sent immediately after signup)
Subject: Welcome to [product] beta
Hi [name],
Thanks for joining the [product] beta. Three things to know:
1. WHAT TO DO FIRST
[link to first-time experience or specific URL]
Should take 5-10 minutes. Tell us how it goes.
2. HOW TO TELL US WHAT'S BROKEN
Use the in-product feedback button (top right) or
reply to this email. We see and respond to everything
within 24 hours during beta.
3. WHAT TO EXPECT THIS WEEK
[Brief framing of the beta and what comes next]
We'll be in touch midweek. In the meantime, jump in.
— [team member]
EMAIL 2: MID-BETA CHECK-IN (sent on day 3)
Subject: How's it going so far?
Hi [name],
Quick check-in. Three short questions:
1. Did you complete [the first core workflow]? If not,
what got in the way?
2. What surprised you — positively or negatively — about
the experience?
3. If we kept improving this for the next four weeks,
what one thing would you most want changed?
[Optional: 15-min call link if you'd rather talk]
— [team member]
EMAIL 3: WEEK-END FOLLOW-UP (sent on day 7)
Subject: Wrapping up week 1
Hi [name],
You've used [product] for the past week. Here's what we
saw on our side:
- [n] sessions, [m] [primary actions]
- Most-used feature: [feature]
- Things that didn't work for you: [if known]
We're now starting work on [the next phase]. Two questions
that would help us decide what to build:
1. If we shipped [feature X], would you use it?
2. We're thinking about pricing at [point]. Is that the
right zone for you?
— [team member]
25.4.2 Self-serve onboarding checklist
SELF-SERVE ONBOARDING CHECKLIST
For each new user reaching the in-product first-time experience:
[ ] Welcome screen appears within 2s of first sign-in
[ ] First task is presented as a 30-second-or-less action
[ ] Task completion triggers explicit confirmation ("Done!")
[ ] Second task is presented within the same session
[ ] In-product help is contextual to current screen
[ ] Feedback button visible on every page
[ ] Loading states for actions taking >1s
[ ] Error states for every plausible failure mode
[ ] Mobile-responsive at 375px width and above
[ ] Activation event logged to PostHog at first-task-complete
25.4.3 Customer-health-score calculator
CUSTOMER HEALTH SCORE — [user/customer]
Component metrics:
Activation: [done = 1, not done = 0]
Days active (week 1): [n] days ÷ 7 = [pct]
Inferences this week: [n] count
Acceptance rate (if applicable): [pct]
Retention (returned day 7): [yes/no]
Pricing commitment status: [committed / conditional / no / pending]
Composite (weighted):
0.20 × activation
0.30 × days-active rate
0.20 × inferences-per-week (capped at 50 = 1.0)
0.20 × acceptance rate
0.10 × retention
Health zones:
Green: composite ≥ 0.70
Yellow: composite 0.40 – 0.69
Red: composite < 0.40
Action by zone:
Green: nothing required; periodic check-in
Yellow: personal outreach this week
Red: immediate outreach; investigate and remediate
25.4.4 A/B testing framework (for student-team scale)
A/B TEST DESIGN — [test name]
Hypothesis (specific, falsifiable):
We believe [variant B] will outperform [variant A] on
[metric] by at least [Δ] because [reason].
Variant A: [description; current production]
Variant B: [description; new variant]
Randomisation:
At the [user / session / inference] level, with seed [seed]
for reproducibility.
Allocation:
50/50 (default) or 90/10 if variant B is risky.
Outcome metric:
[Specific metric, with definition]
Sample size:
Available traffic this week: ~[n]
Sample-size requirement at Δ = 10pp: ~1,600 per arm
→ CONCLUSION: This test is directional only; treat as
suggestive, not significant.
Pre-commitments:
Stop date: [date]
Decision rule: ship Variant B if its metric is at least
[Δ_practical] better, else hold or revise.
Results: [filled in at end of test]
25.4.5 Production-hardening checklist (full version)
(See §25.1.5 above for the full checklist. The Monday hardening sprint moves every item to PASS.)
25.4.6 Pricing-conversation script
PRICING-CONVERSATION SCRIPT — [project]
Used in mid-week call, Wednesday Week 7
[ANCHOR]
"I want to tell you what comes after the beta."
[FACTS]
"The alpha was free, and the beta will stay free for
the rest of this period. After that, we're moving to
a monthly subscription. We're thinking [price] per
[unit]."
[CONTEXT]
"Our cost analysis suggests this covers our infrastructure
and is roughly [X]% of what centres of your size pay for
[comparable spend]. We chose it as a sustainable level
that lets us keep improving the product."
[ASK]
"Would you commit, in principle, to continuing at that
pricing once we move to general availability? You don't
have to commit to a specific number of months yet — just
whether the pricing is in the right zone for you."
[LISTEN]
[Capture the user's response in three categories:
- Yes with specifics (number of seats, etc.)
- Yes with conditions (feature additions, etc.)
- No or not yet, with reason]
[FOLLOW-UP]
"Thanks for being honest. Let me make sure I capture
that correctly: [paraphrase the user's response]."
[CLOSE]
"This is exactly the kind of input that helps us decide
the path. I'll follow up after beta with what we're
moving toward."
25.4.7 Data-flywheel instrumentation pattern
Append to the Chapter 22 schema:
-- Outcome labels: every action a user takes on an inference.
create table inference_outcomes (
id uuid primary key default gen_random_uuid(),
inference_id uuid references ai_inferences(id),
user_id uuid references users(id),
outcome_type text not null, -- 'accept' / 'edit' / 'reject' / 'thumbs_up' / 'thumbs_down' / 'use_in_production'
outcome_data jsonb default '{}'::jsonb, -- type-specific data, e.g., edit-distance for edits
created_at timestamptz default now()
);
-- The labelled corpus view: combines inferences with their first / canonical outcome label.
create view labelled_inference_corpus as
select
i.id,
i.user_id,
i.organisation_id,
i.feature_key,
i.input_data,
i.output_text,
i.prompt_version,
i.model_key,
o.outcome_type,
o.outcome_data,
i.created_at as inference_created_at,
o.created_at as outcome_created_at
from ai_inferences i
left join lateral (
select * from inference_outcomes o
where o.inference_id = i.id
order by created_at limit 1
) o on true;The view is queryable by the evaluation pipeline; the per-week corpus snapshot is a CSV export of the view filtered to the week’s inferences.
25.4.8 Beta report template
(See §25.2.7 above for the full template structure.)
25.5 Worked example — Team Aroma’s Week 7
Team Aroma starts Week 7 with the alpha report’s three key learnings: (1) students do not engage on their own initiative; (2) teachers treat AI explanations as drafts to edit; (3) centre-level customisation is more important than anticipated. The Week-7 work implements (1) directly, accommodates (2) in the UX, and queues (3) for end-of-week.
Sunday-Monday: hardening and onboarding prep
The team’s Monday production-hardening sprint closes 7 of 8 open items by 5pm KL. The remaining item — automatic database backups — is handled by upgrading Supabase to the paid tier (USD 25/month). Aliyah accepts the cost as part of the team’s USD 100/month operational budget.
The teacher-assignment-workflow pivot from the alpha report is the largest Week-7 build item. By Monday afternoon, the workflow is implemented: a teacher can now select a topic, generate a set of practice questions for assigned students, and notify the students via in-product notification (with optional WhatsApp link if the centre’s centre owner has configured it). The students see assigned work in their dashboard rather than browsing the full question library.
The cohort recruiting funnel:
| Source | Contacted | Confirmed | Active |
|---|---|---|---|
| Alpha carry-over | 6 | 4 | 3 |
| Warm referrals from alpha users | 14 | 11 | 9 |
| Week-2 re-contacts | 12 | 6 | 4 |
| LinkedIn cold outreach | 18 | 4 | 2 |
| Total | 50 | 25 | 18 |
The 18 active users by Tuesday end-of-day include 6 centre owners (across 6 centres), 8 teachers (some at the same centres as the owners), and 4 students (with parental consent for the under-18s). Geographic distribution: 14 KL/Selangor, 3 Penang, 1 Johor — consistent with the targeting from Chapter 19.
Tuesday-Wednesday: data-flywheel instrumentation
Wei Hao implements the inference_outcomes table and the labelled_inference_corpus view per §25.4.7. The instrumentation is automatic: every teacher-side action (accept / edit / reject) creates an outcome row; every student-side action (correct attempt / incorrect attempt / abandoned) creates an outcome row.
By Tuesday end-of-day the corpus has 87 inferences with 64 outcome labels (74% labelling rate — the unlabeled inferences are mostly student-side practice attempts where the student has not yet submitted an answer). Daniel runs a quick analysis: across the 64 labelled inferences, the teacher-acceptance rate is 71% (vs 60% target from Chapter 21’s quality bar), the edit-rate is 22%, the rejection rate is 7%. The team is on the favourable side of all three.
The most-informative finding: across the 14 rejected explanations, 9 are for Geometry questions involving 3D visualisation. The system’s explanations for these are text-only, and teachers report the explanations are unhelpful without diagrams. The team queues this for Week 8 — adding diagram support (likely via Mermaid renders or matplotlib outputs from a Python microservice) is a candidate for the Week-8 backlog.
Wednesday afternoon: pricing-conversation calls
Aliyah and Daniel split the pricing-conversation calls: 6 centre-owner calls. The script from §25.4.6 is followed.
Outcomes: - T1 (Mr Lim, Bright Star, ~80 students): “Yes with conditions.” Wants centre-customisation feature in MVP. Would commit at RM 25/student/month if customisation is included. - T2 (Ms Tan, Excel Education, ~120 students): “Yes with specifics.” Commits to RM 30/student/month for 6 months starting Week 11. ~120 students × RM 30 = RM 3,600/month committed. - T3 (Mr Rajan, Sri Murugan, ~50 students): “Yes with specifics.” Commits to RM 25/student/month (asks for the new-customer discount). ~50 students × RM 25 = RM 1,250/month committed. - New centre owner (T4, Smartway Tutorial, ~30 students; new in beta): “Yes with conditions.” Wants Mandarin support. Would commit at RM 30/student/month if Mandarin is included. - New centre owner (T5, ~70 students): “No, not yet.” Waiting for next year’s intake to make decisions; revisit in 6 months. - New centre owner (T6, ~40 students): “Yes with conditions.” Wants integration with WhatsApp for parent visibility. Would commit at RM 30/student/month.
Total committed (specifics + conditions): 5 of 6 centres, totalling ~390 students at RM 25–30/student/month, or roughly RM 11,000–11,700/month potential MRR. The team did not expect this level of commitment by Week 7; it materially changes the funding-pitch scenario for Week 9.
Wednesday evening: the mid-week metrics review
The mid-week review surfaces a concerning signal. T6’s centre, the smaller new beta centre, has activation rate 50% (one of two teachers and one of three students activated) and customer-health-score 0.32 (Red zone). The team investigates. T6 set up the centre on Tuesday but has not invited any of his teachers; the platform’s invitation flow is the issue. Sara identifies that the invitation email goes to spam in Gmail (the email is from a @vercel.app subdomain rather than a custom domain). Wei Hao moves the email-sending domain to a custom subdomain on Wednesday evening and re-tests. The fix deploys Thursday morning; T6’s centre activates on Thursday afternoon.
Without the mid-week review, the issue would have been visible only as “T6 is not engaged” by Friday — and the underlying cause (the email domain) would not have been identified in time.
Thursday: alpha-style scale and the second wave
Thursday is the first full day with the entire 18-user cohort active. The team monitors:
- 47 inferences generated across the cohort during Thursday alone, ~30% above Wednesday’s rate as student users start attempting their assigned work
- 32 outcome labels recorded, with teacher-acceptance rate holding at 73% (slightly up from earlier in the week)
- 4 user-submitted feedback items, 2 critical (a UI bug Sara fixes by 5pm; a teacher’s complaint about the notification timing — fix queued for Week 8)
- Cost: USD 12 for the day across all foundation-model usage; on track for ~USD 70 for the beta week
The team’s Thursday-evening standup focuses on the data-flywheel corpus state (now 200+ labelled inferences, 75% acceptance rate aggregate) and the pricing-conversation results. The team is meaningfully more confident about Week 9’s funding pitch than they were a week ago.
Friday: the beta report
Friday afternoon Aliyah leads the beta-report writing. Key sections:
Section 5 (Data flywheel state): The labelled corpus contains 240 inferences with 178 outcome labels by end of beta week. Teacher-acceptance rate aggregate: 72% (target 60%). The corpus is large enough to begin replacing 30% of the Week-5 golden set with production-derived eval inputs for Week 8 evaluation. The corpus highlights a specific gap: 3D-Geometry explanations have 38% acceptance rate (vs 75% across other topics), suggesting a Week-8 priority on diagram support.
Section 6 (Pricing signal): Five of six centre owners committed to GA pricing in some form: 2 firm, 3 conditional. Total committed monthly recurring revenue (MRR) at GA: ~RM 11,000–11,700. The conditional commitments are tied to Mandarin support, centre-customisation features, and WhatsApp integration. Approximately 60% of the RM 11,700 (~RM 7,000) is unconditioned — the strongest commercial validation the team has had. Implication for Week 9 funding pitch: the team has a credible early-revenue story.
Section 7 (Beta-to-GA gate): - Activation rate 78%: PASS - Week-1 retention: 89% of users active on day 7: PASS - Quality bars on beta traffic: BM clarity 80%, English 84%, format 92%, numerical 96%, hallucination 1.2%: PASS - No ongoing Bug-blocking issues: PASS - Production uptime 99.6%: PASS (one 30-minute Anthropic API outage Wednesday) - Customer-acquisition cost: undefined at student scale; conditional-on-Week-9 pitch - At least 5 users committed at GA pricing: PASS (5 of 6 centre owners) - Recommendation: PROCEED TO GA FRAMING for Week 8
The Friday submission goes in at 11pm KL: the beta report (10 pages), the cohort spreadsheet, the activation-funnel snapshot, the customer-health-score table for all 18 users, the data-flywheel corpus snapshot (CSV export), the pricing-conversation results structured per §25.4.6, the production-hardening checklist marked all-PASS, and the Week 8 plan.
What Team Aroma got right and what they almost got wrong
Three things they did well: (1) the production-hardening sprint on Monday closed 7 of 8 items in one focused day rather than dragging through the week; (2) the data-flywheel instrumentation was implemented as automatic byproduct of normal usage, not as a separate “data collection” project — the corpus accumulated naturally; (3) the Wednesday mid-week metrics review caught the T6 email-deliverability issue 24 hours before it would have surfaced as “T6 is just not engaged.”
Three things they almost got wrong: the team almost set the pricing conversation to Friday (which would have left no time to act on conditional commitments — moving it to Wednesday gave the team room to decide which conditions were worth committing to in Week 8); they almost dismissed the Geometry rejection-rate signal as “small sample” (the 3D-Geometry corpus stratum is a real product gap that diagrams would address, and the team’s discipline of stratifying eval results surfaced it); they almost did not record verbatim the conditional commitments from T1, T4, and T6 (which would have made the Week-9 pitch story weaker — the verbatim record is what makes “5 of 6 committed” defensible).
The pattern is general. Week 7 is high-leverage because the data flywheel becomes operational and the commercial validation begins. The discipline of treating beta as a research instrument (not a marketing event) produces the evidence base that all subsequent work — Week 8’s pricing and unit economics, Week 9’s pitch, Week 10’s mock VC — depends on.
25.6 Course exercises and Week 7 deliverable
Submit the Week 7 deliverable bundle by Friday 23:59. Required artefacts:
25.6.1 Required artefacts
- Beta report (the central artefact, ~8–12 pages following §25.2.7).
- Cohort spreadsheet with each user’s name, role, segment, source, activation status, customer-health score, and pricing-conversation outcome.
- Activation funnel snapshot showing each stage’s conversion rate.
- Data-flywheel corpus snapshot (CSV or structured database export) with at least 100 labelled inferences from beta traffic.
- Pricing-conversation results structured per the §25.4.6 categories, with verbatim quotes for any committed users.
- Production-hardening checklist marked PASS/FAIL on every item, with FAIL-item remediation plans.
- Customer-health-score table for every active beta user, with action items for Yellow and Red zones.
- A/B test results for any in-week tests (with explicit acknowledgement of directional-only conclusions at small scale).
- Week 8 plan stating pricing structure, GTM approach, and unit-economics targets.
25.6.2 Grading rubric (50 points)
| Component | Points | Distinction-level criteria |
|---|---|---|
| Cohort recruiting | 5 | 15+ active users across primary segment + adjacent; recruiting via referrals, re-contacts, and cold not over-reliant on any single source |
| Production hardening | 5 | All items closed; chaos test conducted; uptime ≥99% during beta week |
| Onboarding scale | 5 | Activation rate ≥60%; self-serve flow tested; email sequence operational |
| Data-flywheel state | 10 | Corpus instrumented automatically; ≥100 labelled inferences; labelled corpus queryable for evaluation |
| Pricing signal | 10 | 3+ committed users (specifics or conditions); verbatim quotes captured; conditional commitments tied to specific features |
| Customer success | 5 | Health-score system in place; Yellow / Red users identified and acted on; ≥1 in-week recovery |
| Beta report quality | 5 | All 8 sections populated; learnings tied to evidence; gate evaluation honest |
| Operational discipline | 5 | Cost tracked and within budget; no incidents; team’s Wed metrics review held |
Pass: 30. Credit: 36. Distinction: 42. High Distinction: 47.
Bonus: any team that captures verbatim quotes from at least 3 committed paying users earns +3 points. The verbatim record is the foundation of the Week-9 pitch; teams that build it deserve credit.
The team-comprehension penalty applies; additionally, every team member must have run at least one onboarding (or scaled-onboarding interaction), one mid-week metrics review, and one pricing conversation.
25.6.3 Things to do before Monday of Week 8
By Sunday evening of Week 7, in addition to the deliverable submission:
- Confirm any conditional pricing commitments from beta. Which conditions are worth meeting in Week 8? Which can be deferred? Which can be declined?
- Schedule the Week-9 mock VC pitch slot with the unit instructor (the slot is fixed, but the team’s preparation timeline should align).
- Read Chapter 7 (Healthcare and pharmaceuticals) or another sector chapter relevant to your team’s project, and §26.1–§26.3 of Chapter 26 (Pricing, GTM, and unit economics) before Monday of Week 8. The Week-8 work depends on the unit-economics framework Chapter 26 develops.
References for this chapter
Lean and customer-development methodology
- Ries, E. (2011). The Lean Startup. Crown Business.
- Blank, S. and Dorf, B. (2012). The Startup Owner’s Manual. K&S Ranch.
Data flywheel and AI-native firms
- Iansiti, M. and Lakhani, K. R. (2020). Competing in the Age of AI: Strategy and Leadership When Algorithms and Networks Run the World. Harvard Business Review Press.
- Lamarre, E., Smaje, K., and Zemmel, R. (2023). Rewired: The McKinsey Guide to Outcompeting in the Age of Digital and AI. Wiley.
- Hagiu, A. and Wright, J. (2020). When data creates competitive advantage. Harvard Business Review 98(1).
Customer success and product analytics
- Mehta, N., Steinman, D., and Murphy, L. (2016). Customer Success: How Innovative Companies Are Reducing Churn and Growing Recurring Revenue. Wiley.
- Yohn, D. L. (2020). Customer Success: The Definitive Guide. Customer Success Association.
- PostHog (2024–2026). Product analytics documentation. posthog.com.
A/B testing and online experimentation
- Kohavi, R., Tang, D., and Xu, Y. (2020). Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing. Cambridge University Press.
- Box, G. E. P., Hunter, J. S., and Hunter, W. G. (2005). Statistics for Experimenters: Design, Innovation, and Discovery. (2nd ed.) Wiley.
Bandit algorithms and active learning
- Sutton, R. S. and Barto, A. G. (2018). Reinforcement Learning: An Introduction. (2nd ed.) MIT Press.
- Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4): 285–294.
Cases referenced in §25.3
- Iansiti, M. and Lakhani, K. R. (2020). (Stitch Fix, Ant Group, JPMorgan COiN, DBS GANDALF.)
- Slack Technologies (Butterfield, S.) (2014). Inside the Slack secret. Inc. Magazine.
- Anthropic (2024). Responsible scaling policy. anthropic.com.
- OpenAI (2022, 2023). ChatGPT user growth. Public statements and SEC filings.
Production hardening and security
- Open Web Application Security Project (OWASP) (2024). Top 10 Web Application Security Risks. owasp.org.
- Vercel (2024–2026). Production checklist. vercel.com/docs.
- Supabase (2024–2026). Production checklist and security. supabase.com/docs.
Further reading
For the customer-development and lean-startup foundations, Ries (2011), Blank-Dorf (2012), and Maurya (2012) remain the standard references. For the data-flywheel argument as it applies to AI-native firms, Iansiti-Lakhani (2020) is the comprehensive treatment; the Hagiu-Wright (2020) HBR piece is the most-readable shorter reference.
For online experimentation, Kohavi-Tang-Xu (2020) is the practitioner bible; the Box-Hunter-Hunter classic is the foundational statistics reference. For bandit algorithms specifically (relevant if your team has the budget and time to deploy them), Sutton-Barto (2018) provides the comprehensive reinforcement-learning treatment.
For customer-success literature, Mehta-Steinman-Murphy (2016) is the standard practitioner text. For product analytics in 2026, the PostHog and Mixpanel documentation are the practitioner references; the Amplitude blog covers cases of varying depth.
For production hardening, the OWASP Top 10 is the security baseline; the Vercel and Supabase production checklists cover the platform-specific operational concerns.
Read Chapter 7 (Healthcare and pharmaceuticals) or another sector chapter relevant to your project, and §26.1–§26.3 of Chapter 26 (Pricing, GTM, and unit economics) before Monday of Week 8.