Chapter 24 — Week 6: Alpha launch and early users

Welcome to Week 6. The pre-alpha checklist is signed off. Five to ten friendly users — recruited from your Week-2 customer-discovery interviews and re-confirmed last weekend — are scheduled to onboard this week. By Friday you will have a deployed alpha system, structured feedback from real users in real conditions, an updated feature backlog, and an evidence-based answer to the question should we proceed to beta? The single most-common Week-6 mistake is to treat alpha as a marketing event rather than a research instrument. The alpha is not where you celebrate; it is where the team finally meets the gap between what they think they built and what users actually experience. The discipline that turns alpha into useful learning, not just usage, is the subject of this chapter.

Chapter overview

This chapter follows the same six-part structure. §24.1 (Concept) sets out what an alpha launch is and is not, the alpha-beta-GA progression and where Week 6 sits in it, alpha-cohort recruiting principles, onboarding theory and the think-aloud protocol, observability requirements for an alpha system, the feedback channel architecture, triage and prioritisation under time pressure, and the alpha-to-beta gate. §24.2 (Method) is the day-by-day Week 6 sprint: first-wave onboarding, production observation, second-wave onboarding, mid-week check-in, synthesis, and the alpha report. §24.3 (Lessons from the cases) pulls eight specific alpha-launch lessons from Parts I–III, including Slack’s early-user pattern, Stripe’s manual-onboarding tradition, Anthropic’s progressive-disclosure release model, and the Watson Health and Klarna anti-patterns. §24.4 (Tools and templates) gives you the alpha onboarding script, the think-aloud protocol guide, the session-replay configuration, the structured feedback-intake form, the bug/feature/enhancement triage template, the daily standup format for alpha week, and the alpha report template. §24.5 (Worked example) continues Team Aroma through their actual Week 6: 6 confirmed alpha users (3 teachers, 1 centre owner, 2 students); a Day-2 surprise about student engagement that forces a workflow change; a Wednesday triage that demotes one Must to a Should; the alpha report and the go/no-go decision for beta. §24.6 (Course exercises and deliverables) specifies the Week 6 submission with grading rubric.

How to read this chapter. Read §24.1 in full at Sunday evening of Week 5 (before the alpha starts Monday). Read §24.2 with the team and assign owners per onboarding session. Treat §24.3 as Tuesday-evening reading once your first onboardings are complete; the case lessons land hardest with one or two real onboarding observations under your belt. Use §24.4 throughout the week, especially the onboarding script and the triage template. Read §24.5 on Thursday before drafting your own alpha report. Submit against §24.6 by Friday 23:59.

24.1 Concept

24.1.1 What an alpha launch is — and isn’t

An alpha launch is a controlled, observed, time-bounded deployment of the product to a small friendly cohort whose explicit purpose is to surface what is broken. It is not a public launch; it is not a marketing event; it is not a beta; it is not a soft launch. The framing matters because each of those alternative framings produces different decisions about cohort size, observability, feedback intake, and the fix-versus-feature trade-off.

Three properties distinguish an alpha:

Small. 5–10 users for a student team; up to 50 for a well-resourced startup. Smaller is better at the alpha stage; the marginal cost of one more user (in onboarding, support, observation) is non-trivial, and the marginal information from one more friendly user is small once you have ~5.
Friendly. The users are pre-committed to giving the team detailed honest feedback, tolerating bugs, and being patient through the inevitable rough edges. They are not random users; they are recruited from your Week-2 customer-discovery cohort, where you already have a relationship.
Observed. Every interaction is logged, and the team is paying attention. Session replay, structured feedback, twice-daily team check-ins, direct interviews — multiple channels of observation, all running simultaneously. An alpha that is shipped and forgotten produces noise, not learning.

The alpha-beta-GA progression is a standard pattern, with characteristic transitions:

Pre-alpha (Week 5)
  ↓ — pre-alpha checklist passed
Alpha (Week 6)
  ↓ — alpha-to-beta gate passed
Beta (Weeks 7–8)
  ↓ — beta-to-GA gate passed
General availability (post-course)

Each transition is gated by criteria (§24.1.7 below for alpha-to-beta). The progression is not a release-engineering convention; it is a learning architecture. Each stage tests a different hypothesis. Alpha tests does the system work for real users at all? Beta tests does the system work for users without our intensive support, at meaningful scale? General availability tests does the system work for users we did not personally recruit?

24.1.2 Recruiting the alpha cohort

The Week-2 customer-discovery interviews produced a pool of 20+ interviewees, of which 3–5 fit the early-adopter test from §20.1.6: pain awareness, active workaround, budget, buying authority, tolerance for imperfection. These are your alpha pool.

The recruiting target for Week 6 is 5–10 confirmed alpha users. Recruiting is straightforward — you already have a relationship — but three traps recur:

The cousin trap. A team member’s relative agrees to be an alpha user because they like the team member. Their feedback is biased upward. They tell you the product is great because they want to support you, not because the product is great. Mitigation: every alpha user must come from outside the team’s family/close-friend circle. The Week-2 customer-discovery cohort meets this test by default; recruiting “easier” users from the family network sabotages the alpha’s information value.

The over-friendly trap. An alpha user is so enthusiastic about being early that they downplay problems and exaggerate value. The team feels validated and ships the bug-ridden product to beta. Mitigation: structured feedback (§24.4.4 below) forces specificity. “How would you rate the product?” elicits validation; “Walk me through the last time you used it; what specifically was frustrating?” elicits research findings.

The over-broad trap. A team that recruits across multiple segments to “test the product on different users” usually discovers that the alpha learning is dominated by within-segment variation, not between-segment variation, and the cross-segment recruiting cost is wasted. For Week 6, recruit 5–10 users from your primary segment. Cross-segment expansion is a Week 8+ exercise.

A specific recruitment script for re-confirming Week-2 commitments:

Hi [name],

Hope you're well. Following up on our conversation on [date] —
we're at the point where we're ready to put a working version
in front of a few people, and your input was the most useful
of the conversations we had.

Could we book 30 minutes Monday or Tuesday next week to walk
you through it? After that I'd ask for a few minutes of your
honest reaction over the rest of the week.

I should be upfront that this is an early version with rough
edges. We're more interested in what's broken than in what
works. Are you available?

24.1.3 The onboarding ritual

The first-use experience disproportionately shapes alpha learning. A user who has a good first 10 minutes engages for the rest of the week; a user who has a bad first 10 minutes typically goes silent. The team’s investment in onboarding is repaid many times over by the engagement that follows.

The recommended onboarding ritual: a 30-minute scheduled call (video, screen-share enabled) per alpha user, in which a team member walks the user through the product end-to-end while the user is actually using it on their own machine.

The onboarding has six segments:

Anchor (3 min). Re-establish what this is, what the alpha will cover, what kind of feedback is most useful, that bugs are expected.
Account creation (3 min). The user signs up themselves; the team member observes and notes any friction. This is often where the most-visible first-use defects surface.
Guided first task (10 min). The team member walks the user through completing one core workflow (the wedge feature). Use the think-aloud protocol — ask the user to narrate their thinking as they click. Do not correct misunderstandings in real time; observe them.
Independent second task (5 min). Step back. Let the user try a second task with minimal guidance. Watch where they get stuck. Resist the urge to help.
Feedback elicitation (5 min). Open-ended: what was confusing, what was surprising, what would they want to do differently? Specific: would they use this on their own next week?
Logistics (3 min). Confirm the feedback channel, the next check-in time, the team’s availability for questions.

Two specific notes:

The think-aloud protocol (Ericsson and Simon, 1984) is the standard usability-research method. Users narrate their actions while performing them. The narration captures cognition that silent observation misses — the user’s confusion, their hypothesis about what will happen next, their frustration when expectations don’t match reality. The technique is well-developed; in 30 years of HCI research, no alternative has replaced it.

Recording. With the user’s explicit consent, record the onboarding call. Watching the recordings as a team in the Tuesday team meeting is the single highest-density learning event of the alpha week. A 30-minute recording typically contains 8–12 distinct learnings, far more than any team member would extract from real-time observation alone.

24.1.4 Observability for alpha

The pre-alpha logging from Chapter 22 (every AI inference persisted) and the evaluation pipeline from Chapter 23 (every metric tracked) are necessary but not sufficient. Alpha adds three observability layers:

Session replay. PostHog (free tier supports up to 5,000 recordings/month — sufficient for an alpha), FullStory (paid only), or LogRocket (free tier limited). Session replay records every user click, every page view, every error, and lets the team play back any user’s session as a video. The technique reveals UX problems that no quantitative metric captures: the user who hovered over the wrong button for 45 seconds before clicking; the user who clicked away from a dialog mid-task; the user who reloaded the page three times in 30 seconds because something seemed broken.

Production error tracking. Sentry (free tier supports student-team scale) captures every JavaScript error and every backend exception, with stack traces and user context. The team’s response cadence to errors should be aggressive during alpha: every error gets read, triaged, and either fixed or explicitly accepted as known-issue within 24 hours. Errors accumulating without triage indicate the team is ahead of the user feedback rate, which is the alpha-week failure mode.

LLM-call logging. Every foundation-model call from production traffic is persisted to the same ai_inferences table established in Chapter 22, with the addition of user identifier, session identifier, and any user-side outcome (accepted output / edited / rejected / abandoned). The combination produces, by end of week, a corpus of 100+ real-user inferences with outcome labels — the seed dataset for Week-7 evaluation expansion.

A specific note on cost monitoring during alpha. Foundation-model costs scale linearly with usage. A 5-user alpha with 100 inferences each per day at USD 0.05 per inference is USD 25/day or ~USD 175/week. A 10-user alpha at higher usage can reach USD 500–1,000 for the week. Set budget alerts at 5× expected daily spend; a runaway loop or accidentally infinite recursion can spend a month’s budget in an afternoon. The team’s Anthropic / OpenAI / Google billing dashboard should be checked at end of every team standup.

24.1.5 The feedback channel architecture

Three channels operate simultaneously during alpha; each captures a different kind of signal.

The async written channel. A structured form (typically Notion / Tally / Google Forms) where alpha users submit feedback at their convenience. The form has explicit fields (what feature, what happened, what was expected, screenshot if applicable, severity). The structure forces specificity that free-text email does not produce. Volume target: 1–3 submissions per user per day during alpha week.

The synchronous call channel. A 15–20 minute mid-week check-in call per alpha user (typically Wednesday or Thursday) where the team member asks open-ended questions: “what’s been good? what’s been frustrating? would you keep using this if we stopped supporting you?” The synchronous channel captures emotional tone, hesitation, and the question-behind-the-question that async forms miss.

The in-product feedback channel. A persistent button or widget in the product itself (typical implementation: a small “Feedback” button in the top-right) where the user can submit feedback in-context with auto-attached metadata (current page, current user, current session). In-product feedback has the advantage of being submitted at the moment of use, when the user’s frustration or delight is fresh.

Cumulative volume target across all three channels for a 5-user alpha week: 50–150 distinct feedback items. Below 30 indicates the team is not extracting enough signal; above 200 indicates the product is so broken that the alpha should be halted and the team should ship a fix before continuing.

The team’s response discipline matters: every feedback item gets acknowledged within 24 hours, even if the response is “we hear you, we’re looking at this, we’ll be back to you within 48 hours.” Silence is the fastest way to lose alpha-user engagement; a quick non-substantive acknowledgement preserves the relationship far better than a slow substantive one.

24.1.6 Triage and prioritisation under time pressure

Feedback arrives faster than the team can fix. Triage is the discipline of deciding what to fix this week, what to defer to beta, and what to never fix.

The triage rubric assigns each feedback item one of four categories:

Bug-blocking — the user cannot complete the wedge workflow because of this issue. Fixed within 24 hours.
Bug-degrading — the workflow completes but is harder than it should be. Fixed within the alpha week if cheap, deferred to beta if expensive.
Feature-must — the user identified a missing capability that is genuinely required for the wedge to work. Re-evaluated against the Chapter-21 MoSCoW; may move from Should to Must.
Feature-could — the user requested an enhancement that would be nice but is not required. Logged in the Could backlog for post-beta consideration.

The Wednesday-evening team triage meeting, lasting 60–90 minutes, classifies every feedback item received in the prior 48 hours. Items are converted into Linear (or GitHub Issues) tickets with the category as a tag. Items in the Bug-blocking and Bug-degrading categories get assigned for the rest of Week 6; items in Feature-must get a Week 7 assignment; items in Feature-could go to backlog without commitment.

The triage discipline is where most teams fail in alpha week. A team that tries to fix everything that any user mentions discovers Friday afternoon that they have shipped broken fixes for low-priority issues while bugs in the wedge workflow remain. A team that triages aggressively and fixes only Bug-blocking and Feature-must items in-week ships a coherent improvement to beta.

24.1.7 The alpha-to-beta gate

By Friday of Week 6 the team must answer: do we proceed to beta on Monday of Week 7, or do we hold the alpha for a remediation week?

The gate criteria, both quantitative and qualitative:

Quantitative:

The Chapter-21 quality bars are still met against alpha-traffic measurements (not just the Week-5 golden set).
No Bug-blocking issues remain unresolved.
At least 4 of 5 alpha users used the product on at least 3 distinct days during the alpha week.
At least 3 of 5 alpha users gave a Net Promoter Score of 8+ on a 10-point scale (or equivalent satisfaction measure).
No data-loss incidents, no security incidents, no PDPA / GDPR violations.

Qualitative:

The team can articulate, in writing, why the alpha results suggest beta is appropriate.
The team can articulate, in writing, what beta will test that alpha did not.
At least one alpha user has explicitly said they would continue using the product after the alpha period ends.
The cost-per-user trajectory is consistent with the unit economics modelled in Chapter 21.

If the gate is met, beta starts Monday. If the gate is missed by 1–2 criteria, the team has a remediation conversation: can we fix in 2–3 days? If the gate is missed by 3+ criteria, the alpha holds for another week and beta is delayed. There is no penalty for delay; there is substantial penalty for shipping to beta without alpha learning.

Most student teams will be at the boundary on Friday. The gate is designed to be at the boundary; alpha launches that ship cleanly through the gate are usually under-scoped (the team is not learning) and alpha launches that fail the gate spectacularly are usually over-scoped (the team did not narrow enough in Chapter 21). The team that meets 7 of 10 criteria with 3 honest failures is in the right zone.

24.2 Method — the Week 6 sprint

24.2.1 Sunday evening of Week 5: pre-alpha confirmation

By Sunday evening of Week 5:

Each alpha user has been re-confirmed by personal message, with the Monday/Tuesday onboarding slot booked.
The alpha-onboarding doc is finalised — 1 page, written for the user, covering what to expect, what feedback is most useful, and how to escalate problems.
The feedback form is published and tested.
The session-replay tool is configured and tested (a team member can play back a sample session).
The Anthropic / OpenAI billing dashboard has alerts at 5× expected daily spend.

24.2.2 Monday: first wave onboarding

By Monday end-of-day, 2–3 alpha users have been onboarded via the §24.1.3 ritual. Each onboarding produces:

A 30-minute recording (with consent) stored in the team’s shared Drive.
A written observation note (1 page) capturing the team member’s observations, with timestamps for key moments.
A list of bugs and friction points identified during onboarding, logged in Linear.

The team’s Monday 5pm KL standup reviews the Day-1 onboardings together. Patterns visible across users are flagged immediately; one-off issues are queued for the Wednesday triage. The team often identifies 5–10 issues that will be fixed by Wednesday morning, before the second-wave onboardings begin.

24.2.3 Tuesday: production observation, second-wave preparation

Tuesday is when the first-wave alpha users use the product on their own, without the team’s hand-holding. The team’s job is to watch and listen, not to intervene unless asked.

Concretely:

Two team members watch session replays of the Day-1 alpha users’ independent use, in real time or with a few hours’ delay. They take notes on every friction point.
The async feedback form is monitored continuously; every submission is acknowledged within 4 hours.
The Sentry error feed is checked every 2 hours during team working hours.
Bugs identified yesterday are fixed by Tuesday afternoon and deployed to production after a brief eval-pipeline regression check (Chapter 23’s discipline).

By Tuesday end-of-day, the team has 2–3 onboarding observations, 5–15 written feedback items, 50–200 session-replay events, and a clear picture of where the workflow is and is not working for first-wave users.

24.2.4 Wednesday morning: second-wave onboarding

The remaining 2–3 alpha users are onboarded Wednesday morning. The onboarding ritual is the same as Day 1, but the team has now incorporated bug fixes from Days 1–2; the second wave’s experience should already be visibly better than the first wave’s. The Day-2 fixes are themselves a research output — the team’s iteration speed is measurable.

By Wednesday lunch, all alpha users are onboarded and using the product.

24.2.5 Wednesday evening: the triage meeting

The Wednesday triage meeting (60–90 min) classifies every feedback item received Days 1–3 into the four categories from §24.1.6. The meeting also reviews the cumulative session-replay observations and the metrics dashboard. Three outcomes:

A Linear-ticket backlog for the rest of Week 6, prioritised: Bug-blocking first, then Feature-must, then Bug-degrading.
A list of Bug-degrading items deferred to beta, with explicit deferral rationale (e.g., “fix is non-trivial; user’s workaround is acceptable for alpha”).
A list of Feature-could items added to the long-term backlog, with no commitment.

The triage often produces uncomfortable conversations: an alpha user requested feature X that the team thinks is wrong; an alpha user reported issue Y that the team underestimated; an alpha user gave qualitative feedback Z that contradicts the team’s quantitative metrics. The discipline is to surface these contradictions rather than smooth them over; they are the most-informative content of the alpha.

24.2.6 Thursday: mid-week check-in calls

Each team member runs 1–2 mid-week check-in calls (15–20 min each) with assigned alpha users. The script:

[OPENING]
Thanks for being part of this alpha. I want to understand
how it's going from your side.

[USE PATTERNS]
How many times have you used [the product] this week?
Walk me through the most recent time.

[DELIGHT]
What's been the most useful thing about it?
What surprised you positively?

[FRUSTRATION]
What's been the most frustrating?
What did you expect to happen that didn't?

[CONTINUATION]
If we ended the alpha on Friday and turned it off, would
you miss it? Why or why not?

[REFERRAL]
Is there anyone else you would tell about this?
What would you say to them?

[CLOSE]
Anything I haven't asked that I should have?

Calls are recorded with consent. The team’s Thursday-evening standup reviews the cumulative call data; patterns across users (e.g., “all three teachers said the AI explanation was good but the review-and-correction UI was awkward”) become Friday’s prioritised action list.

24.2.7 Friday morning: final fixes and the alpha report

Friday morning is the last fix window. Bug-blocking items resolved on Wednesday should now be verified working with alpha users. Bug-degrading items prioritised in Wednesday’s triage are deployed and verified. The team also runs the full evaluation pipeline (Chapter 23) one more time, with the alpha-traffic-derived inputs added to the golden set, to verify metric stability against the production distribution.

By Friday afternoon, the team writes the alpha report. The report has six sections:

ALPHA REPORT — [PROJECT]
Week 6, ended [date]

1. SUMMARY
   - Alpha cohort size: [n]
   - Cohort distribution (segment, role, region, etc.)
   - Total inferences logged: [n]
   - Quality-bar status (vs Chapter 21):
     - Metric 1: [actual] / target [...]
     - ...

2. WHAT WORKED
   - The 3-5 things alpha users explicitly praised
   - Each item with a verbatim quote

3. WHAT DID NOT WORK
   - The 3-5 things alpha users explicitly criticised
   - Each item with a verbatim quote and the team's
     diagnostic on why

4. KEY LEARNINGS
   - The 3-5 things the team learned that they did not
     know before alpha
   - Each item with a brief description and the
     implication for beta

5. ALPHA-TO-BETA GATE EVALUATION
   - Quantitative criteria: [met / missed], with evidence
   - Qualitative criteria: [met / missed], with evidence
   - Recommendation: PROCEED TO BETA / HOLD / PIVOT

6. BETA PLAN
   - What beta will test that alpha did not
   - Expanded cohort recruiting plan
   - Quality bar adjustments
   - Feature additions / removals based on alpha learnings

The report is the primary Friday deliverable. It is what the team’s mock VC panel will read in Week 9, and it is what an actual angel investor would ask for as evidence of disciplined product development.

24.2.8 The Friday Submission

Submit the Week 6 deliverable bundle by 23:59 Friday. The team-comprehension penalty applies: every team member must be able to discuss any onboarding observation, any feedback item, or any triage decision.

24.3 Lessons from the cases

Eight specific alpha-launch lessons from Parts I–III shape Week 6 practice.

24.3.1 Slack — early users define the product (Chapter 5)

Slack’s early users (~2013) were Stewart Butterfield’s network from the failed Glitch game — 6 to 8 small companies who were given the product in exchange for direct feedback. The product as it existed in 2014 reflected those users’ specific patterns: heavy emoji use, channel-based organisation, integrations as the primary extensibility surface. The companies that used Slack at the public beta in early 2014 looked structurally like Slack’s eventual market: tech-forward, distributed, software-and-design-first.

Operational implication. Your alpha cohort shapes the product. If you recruit alpha users from the wrong segment, you will build a product for the wrong segment. The Week-2 customer-discovery discipline matters disproportionately at this stage; the early-adopter selection from §20.1.6 is what lets your alpha cohort be genuinely informative.

24.3.2 Stripe — manual onboarding as the founder’s job (Chapter 6, forthcoming)

In Stripe’s first ~12 months, Patrick Collison and John Collison personally onboarded every user. The “Collison installation” — the founders showing up at the user’s office to integrate Stripe themselves — was both a feature and a research instrument. Every installation taught the team something about API ergonomics, error handling, and developer experience that no remote feedback channel would have surfaced.

Operational implication. For a 5–10 user alpha, every onboarding should involve a team member directly, not a self-serve flow. The cost (5 × 30 minutes = 2.5 hours per onboarding wave) is trivial relative to the information gained. Self-serve onboarding is a Week-8+ optimisation; in alpha, do it yourself and watch.

24.3.3 Anthropic Claude — progressive-disclosure release pattern (Chapter 13)

Anthropic’s release pattern for Claude (2022–onward) was research preview → limited beta → general availability, with each transition gated on safety evaluation results, partner feedback, and capability stability. The pattern is now standard for AI-product releases at major labs: progressive disclosure, controlled cohorts, gated transitions.

Operational implication. Your alpha-beta-GA progression is the same architecture, scaled to student-team size. The discipline is identical; what differs is the absolute scale. A 5-user alpha is to a student team what an Anthropic research-preview is to a frontier lab — same purpose, same gating, same learning architecture.

24.3.4 Watson Health — no structured alpha (Chapters 2, 7)

IBM’s deployment pattern for Watson Health was closer to “go directly to beta with paying enterprise customers” than to “controlled alpha with friendly users.” The product’s first users were major cancer centres (Memorial Sloan Kettering, MD Anderson) at full deployment scale. The early user feedback was therefore expensive, slow to collect, and entangled with the customer-relationship dynamics of multi-million-dollar enterprise contracts.

Operational implication. Skipping the alpha is what produces the most-expensive failures. Your alpha cohort can give the team feedback in a week that a beta-or-later cohort will give in a quarter, and the team can iterate freely without the relationship cost of disappointing a paying customer. The marginal cost of running a structured alpha is low; the marginal cost of not running one is potentially the entire product.

24.3.5 Klarna — alpha-skipping and the brand-trust unwind (Chapter 8, forthcoming)

The Klarna AI customer-service deployment in February 2024 was effectively a full-launch presented as a deployment milestone. By bypassing the alpha-beta progression — going from internal testing to ~67% of customer service traffic in one step — the team learned only at full-launch scale that the system was producing customer-experience issues. The reversal in mid-2025 cost months of brand work to undo.

Operational implication. A team that has done Chapter 21’s quality-bar work, Chapter 23’s evaluation work, and Chapter 24’s alpha discipline correctly is structurally protected from this failure mode. The protection is not perfect — alphas can mislead, particularly when the alpha cohort is unrepresentative — but it is far better than the no-alpha alternative.

24.3.6 Cursor — developer alpha, founder dogfood (Chapter 5)

Anysphere’s alpha for Cursor was the founders themselves and a small developer-community cohort recruited from their personal networks (~50 users). The team was both alpha user and alpha builder; every friction the founders experienced was a defect identified in real time. The pattern made the alpha-week iteration cycle extraordinarily fast — a defect noticed at 10am could be fixed and shipped by 2pm, with the founders themselves verifying the fix.

Operational implication. Where your team can be its own alpha user (Pulse for tutoring centres: at least Aliyah and Daniel have tutoring backgrounds and can use the product as if they were teachers), do so explicitly and aggressively. The founder-as-user channel is free and high-density.

24.3.7 Notion — friend-of-friend distribution as alpha cohort (Chapter 5)

Notion’s early-user distribution (2017–2018) ran through Ivan Zhao’s San Francisco network — designers, engineers, and product managers who knew each other and shared early access. The cohort was demographically narrow but produced high-quality feedback because the users were embedded in collaborative workflows where the product’s value was easy to test.

Operational implication. A narrow demographic cohort with shared workflow context is more informative than a broad demographic cohort with diverse contexts. The Week-2 segmentation discipline (§20.1.5) pays off again here — your alpha should be the deepest possible cohort within your primary segment, not a sampling across multiple segments.

24.3.8 ChatGPT — research preview as alpha (Chapter 13)

OpenAI released ChatGPT on 30 November 2022 as a “research preview” — a framing that signalled to users that the product was not finished, and that captured user feedback as a research-grade input rather than a customer-grade complaint. The framing was powerful: users behaved like research participants rather than purchasers, and OpenAI got data quality at scale that a “v1.0 launch” framing would not have produced.

Operational implication. Your alpha can be framed similarly. Tell users explicitly: “this is an early version; what we want from you is the kind of detailed honest feedback you’d give to a research project, not the polite endorsement you’d give to a friend’s launch.” The framing changes both the user’s behaviour and the team’s posture toward the resulting feedback.

24.4 Tools and templates

24.4.1 The alpha onboarding script

ALPHA ONBOARDING — [PROJECT]
30 minutes, video call with screen-share

[ANCHOR — 3 min]
Thanks for being part of this alpha. Quick re-orientation:
  - This is an early version with rough edges.
  - We want detailed honest feedback, not encouragement.
  - You'll have access to the product for the rest of the
    week; we'll check in with you Wednesday or Thursday
    for a longer conversation.
  - Anything weird — please flag it. Even tiny frictions.

May I record this session for the team to learn from?
[wait for explicit consent before continuing]

[ACCOUNT CREATION — 3 min]
I'd like you to sign up exactly as a new user would.
The URL is [...]. As you go, please tell me what you're
seeing and what you're thinking.

[Observe quietly. Note any friction. Do not correct
mistakes or hint at next steps.]

[GUIDED FIRST TASK — 10 min]
Now I'd like you to [the wedge workflow's first variant].

As you do this, please narrate what you're thinking —
what you expect to happen, what you see, whether it
matches what you expected.

[Observe. Take timestamped notes. Resist intervening
unless the user is genuinely stuck.]

[INDEPENDENT SECOND TASK — 5 min]
Now please try [the second variant]. I'll step back.

[Watch silently. Note where the user pauses, where they
re-read, where they look frustrated.]

[FEEDBACK ELICITATION — 5 min]
Open-ended:
  - What was the most confusing moment in the last 25 minutes?
  - What was the most positive moment?
  - If you used this for a week, what's one specific thing
    you'd want changed?

[LOGISTICS — 3 min]
I'll send the feedback form link in chat. Please use it
whenever you have an observation — even tiny ones, even
if you've already told me.

The mid-week check-in is [day/time]. Anything else for now?

[CLOSE]
[Thanks. End the call. Stop the recording. Write up
observations within 30 minutes.]

24.4.2 The think-aloud protocol guide

The think-aloud protocol (Ericsson and Simon, 1984; van Someren et al., 1994) asks users to verbalise their thoughts as they work. Three principles:

Ask for narration of cognition, not justification. “What are you thinking?” beats “Why did you click that?” The first elicits in-process cognition; the second elicits post-hoc reasoning.
Resist interjecting. Even encouraging “mm-hmm” can disrupt the user’s thought flow. Stay quiet; the silence is uncomfortable but productive.
Resist correcting misunderstandings. A user who misunderstands the product’s purpose has just given you data on the product’s communication failure. Help them complete the task only if they are genuinely stuck for >2 minutes.

24.4.3 Feedback intake form template

FEEDBACK FORM — [PROJECT]

Your name: [auto-populated from session]
Date and time: [auto-populated]
Page where this happened: [auto-populated]

What feature were you using?
  [dropdown: AI explanation, Teacher review, Student practice,
   Account/auth, Dashboard, Other]

What happened? (Required)
  [free text, 50–500 chars]

What did you expect to happen instead? (Required)
  [free text, 50–500 chars]

How severe is this?
  [single-select: Critical (can't continue) / Major (very
   frustrating) / Minor (mild annoyance) / Suggestion]

Screenshot or screen recording? (Optional)
  [file upload]

Anything else?
  [free text]

[SUBMIT]

The form’s structure forces specificity. The “what did you expect” field is the most-informative; it surfaces the user’s mental model, which is often more useful than the actual bug.

24.4.4 Triage template

WEEKLY ALPHA TRIAGE — [date]
Reviewed by: [team members]

| Item | User | Category | Severity | Owner | ETA | Notes |
|---|---|---|---|---|---|---|
| Login fails on Safari | T2 | Bug-blocking | Critical | WH | Tue | Apple-specific cookie issue |
| BM explanation uses formal register | T1, T3 | Bug-degrading | Major | DK | Wed | Prompt iteration (already in pipeline) |
| Want bulk question generation | T2 | Feature-must | Medium | AC | Beta | Move from Should to Must in MoSCoW |
| Want progress export to PDF | T1 | Feature-could | Low | — | Backlog | No commitment |
| ... |  |  |  |  |  |  |

Bug-blocking total: [n]
Feature-must total: [n] (compare to MoSCoW)
Backlog total: [n]

24.4.5 Daily standup format for alpha week

For alpha week, replace the standard async standup format with a short structured update from each team member:

ALPHA WEEK STANDUP — [day]
Each team member, 5 lines:

YESTERDAY
  Onboardings completed: [n]
  Tickets closed: [n]
  Hours spent on alpha: [hours]

TODAY
  Onboardings scheduled: [n]
  Tickets in progress: [list]
  Hours planned: [hours]

BLOCKERS
  [...]

OBSERVATIONS
  [the most important thing I noticed yesterday]

24.4.6 Mid-week check-in call script

(See §24.2.6 above; included here as a template artefact for the deliverable bundle.)

24.4.7 Alpha report template

ALPHA REPORT — [PROJECT]
Week 6, [start date] – [end date]

1. SUMMARY
   Alpha cohort: [n] users
   Cohort: [segment, role mix, region mix]
   Total inferences: [n]
   Total feedback items: [n]
   Quality-bar status:
     [metric 1]: [actual] vs target [target]
     [metric 2]: ...

2. WHAT WORKED
   2.1 [theme]
       Verbatim quote: "[...]" — [user]
       Diagnostic: [why this worked]
   2.2 [theme]
       ...
   2.3 [theme]
       ...

3. WHAT DID NOT WORK
   3.1 [theme]
       Verbatim quote: "[...]" — [user]
       Diagnostic: [why this failed]
       Action: [fix in alpha / defer to beta / accept]
   3.2 [theme]
       ...

4. KEY LEARNINGS
   4.1 [learning]
       Evidence: [observations / data / interviews]
       Implication for beta: [...]
   4.2 [learning]
       ...

5. ALPHA-TO-BETA GATE EVALUATION
   Quantitative criteria:
     [each criterion: met / missed, with evidence]
   Qualitative criteria:
     [each criterion: met / missed, with evidence]
   Recommendation: PROCEED / HOLD / PIVOT

6. BETA PLAN
   Cohort: [target size and recruitment approach]
   New tests: [what beta will measure that alpha did not]
   Quality bar adjustments: [...]
   Feature changes:
     - Add: [...]
     - Remove: [...]
     - Modify: [...]
   Risk: [top 3 risks for beta with mitigations]

24.5 Worked example — Team Aroma’s Week 6

Team Aroma starts the week with the Week-5 metrics dashboard showing all five quality bars met (BM clarity 81%, English clarity 83%, SPM-format 92%, numerical correctness 95%, hallucination 1.5%). The pre-alpha checklist passed Sunday evening with the FAIL items remediated.

Sunday evening: confirmation

Aliyah re-confirms 6 alpha users from the Week-2 corpus:

T1 — Mr Lim, owner of Bright Star Tutorial Centre (PJ, ~80 students). One of the strongest early-adopter signals from Week 2.
T2 — Ms Tan, owner of Excel Education Centre (Subang, ~120 students). Most willing to pilot at paid pricing.
T3 — Mr Rajan, owner of Sri Murugan Tuition Centre (Klang, ~50 students). High engagement during Week-2 interview.
Tch1 — Cikgu Aishah, head teacher at Bright Star (T1’s centre). Daily user of any tool the centre adopts.
S1 — A Form-5 student at Excel (with parental consent). Daughter of Ms Tan, willing to test as a student-side perspective.
S2 — A Form-5 student at Sri Murugan (with parental consent). Recommended by Mr Rajan as a strong, motivated student.

The 3 centre owners + 1 teacher + 2 students captures all three actor types in the workflow. Onboarding sessions are booked: T1 + Tch1 together Monday morning; T2 Monday afternoon; T3 + S2 together Tuesday morning; S1 Wednesday morning.

Monday morning: first onboarding (T1 + Tch1 together)

Aliyah and Sara run the joint onboarding for Mr Lim and Cikgu Aishah at Bright Star. Mr Lim is the buyer; Cikgu Aishah is the user. The 30-minute structure is tweaked: 15 minutes with both present; then Mr Lim steps out and Aliyah works with Cikgu Aishah for another 15 minutes on the teacher-side workflow.

Three observations from the joint section:

Mr Lim immediately asks about pricing for centres of his size. Aliyah deflects to “we’ll figure that out together later” — appropriate at alpha stage.
Cikgu Aishah is warm but reserved. Her body language during the demo suggests she is sceptical that AI will produce explanations as good as hers.
The centre’s wifi is unstable; the demo loads slowly. The team adds “test on slow connections” to the action list.

The teacher-side onboarding produces a critical observation: Cikgu Aishah’s first instinct on the teacher-review interface is to read the AI’s explanation aloud as if reading a draft from a colleague. She catches a small phrasing issue (“rumus yang kita guna sini” — slightly informal) and edits it inline. The edit takes 12 seconds. The team realises that the teacher-review workflow is closer to “editing a colleague’s draft” than they had modelled in Chapter 21; this reframes some of the UI design.

Monday afternoon: T2 onboarding

Wei Hao runs Ms Tan’s onboarding remotely (she is at her centre during a quiet afternoon). Ms Tan is the team’s strongest commercial signal — she explicitly offered RM 30/student/month during Week-2 interviews. The onboarding goes smoothly; she spends extra time exploring the analytics dashboard (which is rough) and asks several pricing-and-billing questions the team had not anticipated.

Two observations: (a) she wants visibility into per-student progress that the current dashboard does not provide cleanly; (b) she explicitly asks “can my teachers customise the AI’s tone for our centre’s brand?” — a question pointing at a feature category the team had not considered (centre-level customisation).

Monday evening: standup

The Day-1 standup runs 25 minutes. Patterns visible already:

Both centre owners ask about pricing earlier than expected. The team should prepare a “pricing is being figured out; alpha is free” answer and a Week-8 pricing plan.
Both onboarded teachers (Cikgu Aishah is the only teacher today; the others come later) treat the AI explanation as a draft rather than a final output. This is consistent with the Chapter-21 “Concierge + Wizard-of-Oz” archetype but the UI is not yet designed for the editing workflow at speed.
The slow-wifi observation suggests the team should add latency-budget testing.

The team commits to four fixes for Tuesday: (1) a “loading skeleton” UI that reduces perceived latency on slow connections; (2) inline-edit tools in the teacher-review interface (currently you can only edit by switching to a separate edit mode); (3) a “latest activity” panel on the centre-owner dashboard; (4) a one-paragraph “current pricing thinking” written for centre owners to read.

Tuesday: production observation, second-wave preparation

Tuesday morning Wei Hao watches the Day-1 alpha users’ independent use via PostHog session replay. Cikgu Aishah used the system at 6:45am (early start; the team had not anticipated this) for ~25 minutes, generating 8 explanations. She accepted 5 unchanged, edited 2, rejected 1. The rejection is the most-informative event: her free-text justification reads “explanation salah; integrasi by parts terbalik” (the explanation is wrong; integration by parts is reversed). The team checks the actual output: the model picked \(u\) and \(dv\) in the wrong order, leading to a more-complex integral than the original. Daniel investigates and discovers that the prompt’s guidance on integration-by-parts is ambiguous; he iterates the prompt and re-runs the eval pipeline; the BM clarity score holds at 81% and SPM-format rises to 93%.

Tuesday afternoon Sara works on the inline-edit and loading-skeleton fixes. By 6pm KL both are deployed. The team’s Day-2 ticket-close count is high.

Tuesday afternoon stumble: the student-engagement surprise

S1 (Ms Tan’s daughter) was supposed to start using the system on Monday evening. By Tuesday afternoon she has not logged in. Aliyah sends a gentle nudge through Ms Tan; Ms Tan responds “she’s been busy with school work, I’ll prompt her.” S1 logs in Tuesday evening but spends only 12 minutes on the platform, completes 1 explanation, and does not return.

The pattern is concerning: students do not engage on their own initiative. This was actually anticipated in Week-2 customer discovery (the modal Form-5 student said “I don’t want another homework app”) but the team had hoped the SPM-aligned UX would overcome the resistance. It does not.

The Tuesday evening standup discusses the implication: the workflow needs to be teacher-initiated. A teacher assigns specific questions; the student gets a notification (perhaps via WhatsApp, perhaps via in-app); the student completes the assigned work. The current UX has students browsing the question library independently — which they will not do. Aliyah opens a Linear ticket for “teacher-assignment workflow” and tags it Feature-must.

Wednesday morning: second-wave onboarding (T3 + S2 together)

Daniel runs the joint onboarding for Mr Rajan and S2 at Sri Murugan via video call. Two observations: (a) Mr Rajan’s reaction to the loading-skeleton fix is positive (“oh, it’s much faster than I thought” — he had heard about the centre with bad wifi); (b) S2 is more engaged than S1 because Mr Rajan is sitting with him during the onboarding. The Day-2 student-engagement insight is reinforced: students engage when teachers prompt them.

S2’s behaviour during the second 15 minutes when Mr Rajan steps out: he completes the assigned exercise quickly, reads the explanation thoroughly (engaged, taking notes), and submits a follow-up question. The follow-up is a feature the team has not built yet (“if I don’t understand the explanation, can I ask for more help?”). S2 is the first user to surface the need for an interactive follow-up loop.

Wednesday morning: S1 second onboarding

Sara catches S1 with a personal video call after morning school. The call is more like a tutoring session than an onboarding. Sara watches S1 use the platform, asks gentle questions about what she finds confusing. Three observations: (a) S1 finds the BM-only mode disorienting because her Add Maths classes use mixed BM/English; (b) S1 wants to know what other students are getting right or wrong (social comparison); (c) S1 says “I would use this if my teacher gave me work on it” — confirming the teacher-assignment-workflow insight.

Wednesday evening: triage meeting

The triage meeting (90 minutes) reviews 47 feedback items collected Days 1–3 (slightly above target). Categorisation:

Bug-blocking: 4 items. The Safari login bug (Wei Hao, fixed Tuesday); the SPM-rubric inconsistency for integration-by-parts (Daniel, fixed Tuesday); a date-display issue on the centre-owner dashboard (Sara, fixing); a session-timeout bug (Wei Hao, fixing).
Bug-degrading: 11 items. Most are UI/UX polish (button labels, spacing, colours); 2 are minor copy issues. Distribution: 3 fixed in alpha, 8 deferred to beta with explicit user notification.
Feature-must: 3 items. The teacher-assignment workflow (Aliyah, beta); centre-level customisation (Daniel, beta — Phase 2); the follow-up question loop (Wei Hao, beta).
Feature-could: 29 items. Bulk operations, exports, advanced analytics, gamification, etc. All deferred to backlog.

The meeting also produces a decision: the centre-level customisation Feature-must is moved from a vague Should to a concrete Must for beta. Daniel writes a brief design spec for it. The MoSCoW is now updated for beta planning.

Thursday: mid-week check-in calls

Each team member runs 1 check-in call (5 calls total — S1 and S2 are combined into one student-side call run by Sara). Aggregate findings:

Mr Lim’s Net Promoter Score: 8/10. He would recommend it to other centre owners. Cikgu Aishah’s NPS: 7/10. She would use it but wants the editing workflow to be faster.
Ms Tan’s NPS: 9/10. She is enthusiastic. She asks about beta and pricing; Aliyah commits to an answer by end of beta (Week 8).
Mr Rajan’s NPS: 7/10. Positive but more reserved. He wants to see how it works over a full month.
S1’s NPS: 5/10. Honest. She would use it if a teacher assigned work; she would not seek it out.
S2’s NPS: 8/10. Strong. He asks about Mandarin support (the team’s deferred feature).

The team computes the aggregate NPS: weighted toward centre owners and teachers (the buyers and primary users), the cohort skews 7.5 — within the alpha-to-beta gate threshold. Three of five buyers (centre owners) gave 8+ — meeting the criterion.

Friday: final fixes and the alpha report

Friday morning: the remaining Bug-degrading items in alpha scope are deployed. The full evaluation pipeline (Chapter 23) runs against the alpha-traffic-derived inputs. Quality bars hold: BM clarity 82% (alpha distribution), English clarity 83%, SPM-format 93%, numerical 95%, hallucination 1.4%.

The team writes the alpha report. Key sections:

Section 4.1 (Key learning): Students do not engage with the platform on their own initiative; engagement is teacher-mediated. Evidence: S1 logged in only after Ms Tan prompted her, and engaged for 12 minutes before disengaging; S2 engaged actively when Mr Rajan was sitting with him; both students explicitly stated they would use it if a teacher assigned work. The implication for beta: the workflow architecture pivots from “student-initiated practice” to “teacher-assigned practice with student completion.”

Section 4.2 (Key learning): Teachers treat AI explanations as drafts to edit, not final outputs. Evidence: across 47 explanations generated during alpha, teachers accepted 31 unchanged, edited 14 inline, and rejected 2. The 14 edits averaged 23 seconds each. The implication: the UX must optimise for fast inline editing, not for accept/reject decisions.

Section 4.3 (Key learning): Centre-level customisation is more important than anticipated. Evidence: Ms Tan, Mr Lim, and (implicitly) Cikgu Aishah all surfaced versions of “can the AI’s tone match our centre’s style?” The implication for beta: centre-customisation is promoted from Feature-could to Feature-must.

Section 5 (Alpha-to-beta gate): - Quality bars met: PASS (5 of 5) - No Bug-blocking issues: PASS - 4 of 6 alpha users used 3+ days: PASS (4 of 6 — barely) - 3+ users gave NPS 8+: PASS (3 of 5 buyer/teacher segment) - No security/data incidents: PASS - Recommendation: PROCEED TO BETA with the architectural pivot to teacher-assigned workflow.

The Friday submission goes in at 10pm KL: the alpha report (8 pages), the onboarding observation notes, the triage spreadsheet, the updated MoSCoW for beta, the metrics dashboard snapshot, the cost log (USD 47 for the alpha week — well under the USD 100 budget), and the beta plan one-pager.

What Team Aroma got right and what they almost got wrong

Three things they did well: (1) the joint-onboarding format for centre owner + teacher captured the workflow dynamic that single-onboardings would have missed; (2) the Tuesday-evening recognition of the student-engagement issue produced a Day-3 architectural pivot (teacher-assigned workflow) rather than a Week-8 retrospective recognition; (3) the triage discipline kept Friday’s deployment scope narrow (4 Bug-blocking + 3 in-alpha Bug-degrading), avoiding the “fix everything” Friday-night failure mode.

Three things they almost got wrong: the team almost shipped the bulk-question-generation feature (which is Could, not Must) because Mr Lim asked for it (the triage discipline saved them); they almost dismissed S1’s low NPS as “she’s just being a teenager” rather than treating it as the signal it was about teacher-mediated workflow; and they almost extrapolated from Ms Tan’s enthusiasm (NPS 9) without weighting that Ms Tan is the team’s strongest commercial signal but not necessarily the modal centre owner. The team’s discipline around stratified analysis kept these biases visible.

The pattern is general. Week 6 is high-leverage because real users in real conditions surface the workflow architecture issues that no amount of internal discussion would have produced. The architectural pivot from student-initiated to teacher-assigned workflow, identified Tuesday and committed Wednesday, is exactly the kind of learning the alpha exists to produce.

24.6 Course exercises and Week 6 deliverable

Submit the Week 6 deliverable bundle by Friday 23:59. Required artefacts:

24.6.1 Required artefacts

Alpha report (the central artefact, ~6–10 pages following the §24.4.7 template).
Onboarding observation notes (one per onboarding session, ~1 page each).
Onboarding recordings (with user consent; uploaded to shared Drive).
Triage spreadsheet showing all feedback items classified per the §24.1.6 categories.
Updated MoSCoW for beta showing changes from the Chapter-21 baseline based on alpha learnings.
Updated metrics dashboard snapshot showing alpha-traffic-derived measurements vs Chapter-21 quality bars.
Cost log for the alpha week, with comparison to budget.
Beta plan one-pager stating cohort, new tests, quality-bar adjustments, and risks.
Mid-week check-in call notes (one per call).

24.6.2 Grading rubric (50 points)

Component	Points	Distinction-level criteria
Alpha cohort quality	5	5+ users from primary segment; covered all major user roles (buyer + user + customer); no cousin-syndrome recruitment
Onboarding discipline	10	Every onboarding via §24.1.3 ritual; recordings made (with consent); think-aloud observed; written observations captured
Production observation	5	Session replay reviewed; errors triaged within 24 hours; metrics monitored daily
Feedback depth	5	30+ feedback items captured; categories used; user voice captured verbatim where possible
Triage rigour	10	Items classified into Bug-blocking / Bug-degrading / Feature-must / Feature-could; Wednesday triage held; explicit deferral rationale
Alpha report quality	10	All 6 sections of §24.4.7 populated; learnings tied to evidence; gate evaluation honest
Cost discipline	5	Spending tracked; budget alerts active; cost trajectory consistent with Chapter-21 unit economics

Pass: 30. Credit: 36. Distinction: 42. High Distinction: 47.

Bonus: a team that made a substantive workflow change in-week based on alpha learnings (rather than carrying all changes to Week 7) earns +3 points. The point of the alpha is to learn and to act on the learning; teams that act earn credit.

The team-comprehension penalty applies; additionally, every team member must have run at least one onboarding and one mid-week check-in call.

24.6.3 Things to do before Monday of Week 7

By Sunday evening of Week 6, in addition to the deliverable submission:

Recruit the beta cohort: target 15–25 users (3–5x the alpha size), with the alpha users carrying over and new users recruited from the broader Week-2 cohort plus warm referrals from alpha users.
Schedule beta onboarding for Monday/Tuesday of Week 7. The onboarding ritual scales: 5–10 minutes per user, with self-serve onboarding paths for the simpler cases.
Implement the alpha report’s Section 5 changes (architectural pivots, MoSCoW updates) over the weekend if material.
Read Chapter 6 (Finance and banking) or another sector-relevant analytical chapter, and §25.1–§25.3 of Chapter 25 (Beta and the data flywheel) before Monday of Week 7. The data-flywheel reading is the conceptual frame for what beta scales beyond what alpha could.

References for this chapter

Lean and customer-development methodology

Ries, E. (2011). The Lean Startup. Crown Business.
Blank, S. and Dorf, B. (2012). The Startup Owner’s Manual. K&S Ranch.

Usability research and user observation

Ericsson, K. A. and Simon, H. A. (1984). Protocol Analysis: Verbal Reports as Data. MIT Press.
van Someren, M. W., Barnard, Y. F., and Sandberg, J. A. C. (1994). The Think Aloud Method: A Practical Guide to Modelling Cognitive Processes. Academic Press.
Krug, S. (2014). Don’t Make Me Think, Revisited: A Common Sense Approach to Web Usability. (3rd ed.) New Riders.
Portigal, S. (2013). Interviewing Users: How to Uncover Compelling Insights. Rosenfeld Media.

Alpha, beta, and progressive-disclosure release patterns

Anthropic (2024). Claude release notes. docs.anthropic.com.
OpenAI (2022). Introducing ChatGPT (research preview). 30 November 2022. openai.com/blog.
Slack Technologies (Butterfield, S., interviews and Slack engineering blog 2014–2016).
Stripe (Collison, P., interviews 2011–2015 covering manual onboarding pattern).

Cases referenced in §24.3

Iansiti, M. and Lakhani, K. R. (2020). Competing in the Age of AI. Harvard Business Review Press.
Klarna AB (2024, 2025). Press releases on AI customer service deployment and reversal.
Anysphere (Cursor) public engineering and founder communications 2023–2026.

Net Promoter Score and customer satisfaction measurement

Reichheld, F. F. (2003). The one number you need to grow. Harvard Business Review 81(12): 46–54.
Kohavi, R., Tang, D., and Xu, Y. (2020). Trustworthy Online Controlled Experiments. Cambridge University Press. (For NPS validity discussion.)

Session replay and observability tooling

PostHog (2024–2026). PostHog documentation.
Sentry (2024–2026). Sentry documentation.
LangSmith / Langfuse / Helicone (2024–2026). LLM-specific observability tooling documentation.