Chapter 3 — The AI factory

Iansiti and Lakhani argue that the firm of the AI age is organised around an “AI factory” — a scalable decision engine that transforms data into predictions, pattern recognition, and process automation. This chapter develops the framework with the technical and economic depth that a graduate-level course requires. We treat each of the factory’s four components as engineering disciplines in their own right, formalise the virtuous cycle as a model of increasing returns, and connect the framework to the broader platform-economics and antitrust literature.

Chapter overview

This chapter is the conceptual core of Part I. sections §3.4–§3.7 take up the four components of the AI factory in turn, with technical depth on each. §3.9 formalises the virtuous cycle. sections §3.10–§3.12 develop three reference cases (Netflix, Ant Group, Amazon) at chapter-length depth. §3.15 connects the framework to the platform-economics literature. §3.16 develops the antitrust implications and the Khan (2017) critique. The chapter concludes with operating-model implications and a substantial exercise set.

The runtime metaphor revisited

AI is the runtime that is going to shape all of what we do… The most exciting thing is the bedrock of capabilities that have been built. So now you put what is essentially a virtual assistant on top of any application that has those capabilities, and you have a copilot.

— Satya Nadella, quoted in Iansiti and Lakhani (2020), Ch. 1

An AI factory is the operational pattern that lets a firm replace human-mediated decision processes with software-mediated ones at scale. It is the answer to the question Iansiti and Lakhani (2020) pose at the start of Competing in the Age of AI: how does Ant Group serve 700 million customers with 10,000 employees, while traditional banks of equivalent size employ 200,000?

The answer is not that Ant Group’s bankers are 20× more productive. It is that Ant Group’s operational critical path is run by software, with humans designing, supervising, and improving the system rather than executing inside it. Iansiti and Lakhani’s central observation, which deserves to be memorised at thesis-defence depth, is that this structural shift produces increasing returns to scale, scope, and learning, in contrast to the diminishing returns that characterise traditional firms.

The factory as critical-path inversion

A useful way to read the framework is as an inversion of where intelligence lives in the firm. In a traditional bank, the loan officer’s judgement is the critical path; supporting systems (credit reports, appraisal databases, regulatory checks) feed into the human decision. In Ant Group’s factory, the algorithm’s judgement is the critical path; supporting systems (human review queues, exception handling, regulatory escalation paths) feed into the algorithmic decision. The intelligence lives in the system, not in the operator.

This inversion is what makes the AI factory’s economics so different. A bank that grows from 1M to 10M customers must grow its loan-officer staff roughly proportionally, plus a coordination overhead. Ant Group’s factory faces almost no marginal cost as customer volume grows — the same algorithm processes 1M and 10M loan decisions, with the same compute infrastructure scaled horizontally.

The four components

Iansiti and Lakhani (2020) identify four components common to every operating AI factory. Each is necessary; collectively they are the prerequisite for digital scale, scope, and learning.

flowchart LR
    A["1. Data pipeline<br/>ingestion · cleaning · features"] --> B["2. Algorithm development<br/>training · validation · selection"]
    B --> C["3. Experimentation platform<br/>A/B tests · canary releases"]
    C --> D["4. Software infrastructure<br/>cloud · MLOps · monitoring"]
    D --> A

    style A fill:#e8f3fa,stroke:#006DAE,stroke-width:2px
    style B fill:#fdf3e7,stroke:#d97706,stroke-width:2px
    style C fill:#e9f5ec,stroke:#059669,stroke-width:2px
    style D fill:#f3eef7,stroke:#7c3aed,stroke-width:2px
Figure 4.1: The four components of the AI factory.

Data pipeline — the fuel

The data pipeline is where most AI projects succeed or fail. It is also where most of the cost lives.

The pipeline as managed flow

A data pipeline is more than data engineering. It is a managed flow with seven distinct stages, each requiring its own engineering investment:

  1. Ingestion: connecting to source systems (transactional databases, log streams, third-party APIs, IoT sensors). Modern stacks use change-data-capture (Debezium, Fivetran, Airbyte) for transactional sources and event-streaming infrastructure (Kafka, Pulsar, Kinesis) for log streams.
  2. Schema enforcement and validation: ensuring incoming data conforms to expected types, ranges, and relationships. Modern tools (Great Expectations, Soda, Pydantic) make schema validation declarative.
  3. Deduplication and integration: identifying records that refer to the same real-world entity across sources. The canonical methods are deterministic matching (where stable identifiers exist) and probabilistic matching (Fellegi-Sunter, modern ML-based entity resolution).
  4. Labelling: producing target values for supervised learning. The taxonomy is manual labelling (expensive, high-quality), weak supervision (Snorkel-style heuristic labelling functions), self-supervision (next-token prediction, masked modelling — the foundation-model paradigm), and active learning (iteratively asking the model what to label next).
  5. Feature engineering: transforming raw data into features. The feature-store pattern (Feast, Tecton, Databricks Feature Store) versions and serves features consistently across training and inference, eliminating the train-serve skew that has caused many ML deployments to fail in production.
  6. Privacy and access control: PII redaction, differential privacy budgets, role-based access controls, and audit logging are pipeline-layer concerns that downstream teams cannot fix.
  7. Provenance tracking: recording the lineage of every dataset back to its sources. Essential for debugging, regulatory response, and reproducibility.

In modern terminology, a well-engineered data pipeline culminates in data products — versioned, governed, owned datasets that downstream teams consume through stable interfaces. A data product has a name, a schema, an owner, an SLA, and consumer documentation, exactly like a software product.

The 70–80% effort fraction

The widely-cited statistic that 70–80% of AI project effort goes into data work is not exaggerated. McKinsey’s 2023 work on AI deployment cost shows the same pattern: data wrangling and feature engineering dwarf model training in time, money, and complexity.

The implication is that the firm’s binding constraint is almost never modelling capability; it is data quality and data-pipeline maturity. Where firms underinvest in this layer, model performance is unstable and reproducibility is impossible.

The data-mesh / warehouse / lake debate

Three architectural patterns compete for the data-platform layer:

  • Data warehouse (Snowflake, BigQuery, Redshift): structured, schema-on-write, SQL-centric. Strengths: query performance, consistency, mature tooling. Weaknesses: rigid schema, expensive for unstructured data.
  • Data lake (S3 + open file formats like Parquet, Delta, Iceberg): cheap storage, schema-on-read, supports any data type. Weaknesses: query performance, governance, the “data swamp” failure mode.
  • Lakehouse (Databricks, Snowflake’s open-tables strategy, Microsoft Fabric): combines lake economics with warehouse query semantics. The modern default for new builds.
  • Data mesh (Dehghani’s architectural proposal): organisational rather than technological. Decentralises data ownership to domain teams who publish data products with consistent governance.

The McKinsey reading by 2025 is that the lakehouse has won the storage architectural argument and the data mesh has won the organisational architectural argument.

RAG infrastructure as a 2024+ data-pipeline concern

The 2024–2026 development is the integration of retrieval-augmented generation (RAG) infrastructure into the data layer. The data engineering required to make RAG work is often misunderstood as “just put it in a vector database”; in practice, it requires:

  • Chunking: splitting documents into retrievable units of the right granularity.
  • Embedding: producing vector representations (BGE, text-embedding-3, voyage-2).
  • Indexing: building the vector index (FAISS, Pinecone, pgvector, Weaviate, ChromaDB).
  • Permission preservation: ensuring retrieval respects per-user access controls.
  • Freshness management: re-indexing as documents change.
  • Quality monitoring: detecting when retrieval fails — the most common failure mode in production RAG.

A graduate student asked to specify a RAG system should specify all six layers, not just the vector-database choice.

Algorithm development — the machines

Algorithm development is the part of the factory most visible to outsiders. The Iansiti–Lakhani treatment emphasises — correctly, and increasingly so — that this is the part that has commoditised fastest in the post-2022 period.

The commoditisation argument

AutoML platforms (DataRobot, H2O.ai, Google AutoML), pre-trained foundation models (OpenAI, Anthropic, Google, DeepSeek, Mistral), and open-source modelling frameworks (PyTorch, JAX, TensorFlow, Hugging Face Transformers) have collapsed the distance between “state-of-the-art research” and “production model”. A team of two engineers using Hugging Face fine-tuning today can deploy what would have required ten ML PhDs five years ago.

The DeepSeek-R1 release in January 2025 (DeepSeek-AI, 2025) demonstrated this commoditisation at the frontier: open-weight reasoning capability matched closed-weight frontier models within months, at a fraction of the training cost. The implication is that algorithmic capability is rarely the moat. The moat is the surrounding factory.

The modelling lifecycle

The standard ML modelling lifecycle has six stages, each with associated tooling:

  1. Problem framing: turning a business question into an ML formulation. Is this a regression, classification, ranking, retrieval, generation, or RL problem? What is the loss function? What is the evaluation metric, and is it aligned with the business outcome?
  2. Data preparation: producing the training, validation, and test splits, with appropriate handling of leakage, distribution shift, and class imbalance.
  3. Model selection: choosing the model class (linear, tree-based, deep, foundation-model fine-tune, frozen-foundation-model with prompting). Modern best practice: try a strong baseline first; only invest in deep models when the baseline plateaus.
  4. Training and hyperparameter optimisation: typically Bayesian optimisation (Optuna, Vizier) over a defined search space.
  5. Evaluation: holdout test set, cross-validation, hold-out-by-time for time-series, hold-out-by-entity for non-iid data. Calibration is increasingly important — a well-calibrated model whose 80% confident predictions are right 80% of the time is more useful for downstream decisions than a higher-accuracy uncalibrated model.
  6. Validation against business outcome: closing the loop with the experimentation platform.

Buy, build, or fine-tune

Modern teams face three architectural choices:

  • Buy/use: call a hosted foundation-model API (OpenAI, Anthropic, Google). Cheapest to start, expensive at scale, vendor-locked.
  • Self-host an open-weight model: deploy Llama, Qwen, DeepSeek, Mistral on owned infrastructure. Higher upfront cost, lower marginal cost at scale, sovereign.
  • Fine-tune: take a foundation model and adapt it on domain-specific data. The LoRA / QLoRA techniques have made fine-tuning radically more efficient.

The right choice depends on volume (high-volume use cases justify self-hosting), sensitivity (regulated data may not be allowed to leave a sovereign perimeter), and iteration speed (research-mode firms benefit from API access to the frontier).

Experimentation platform — the valves

The experimentation platform is where the AI factory differs most starkly from classical analytics. Amazon’s Weblab runs over 30,000 simultaneous experiments at any time. Netflix runs hundreds of concurrent A/B tests. Booking.com reportedly runs more than 1,000 simultaneous experiments. These are not occasional A/B tests; they are continuous discovery infrastructures.

A/B testing as the firm’s epistemic instrument

The modern A/B test is a randomised controlled trial. Users are assigned at random to a treatment or control arm; the treatment’s effect on a primary outcome metric is estimated; statistical inference is conducted under a pre-registered hypothesis. The primary technical machinery:

  • Random assignment: usually a hash of user-ID to a 0–99 bucket, with bucket ranges deterministically assigned to arms. Hash-based assignment ensures consistency across sessions.
  • Sample-ratio mismatch detection: a chi-squared test that the observed arm sizes match expected sizes, which catches assignment bugs.
  • Significance testing: typically a two-sided \(t\)-test or its non-parametric equivalent. With multiple metrics, a Bonferroni or Benjamini–Hochberg correction.
  • Sequential testing or fixed-horizon testing: sequential testing (mSPRT, Always-Valid Inference) lets analysts peek at results without inflating false-positive rates; fixed-horizon testing requires committing to a sample size in advance.
  • Variance reduction: CUPED (Controlled Pre-Experiment Data) and stratified estimation can reduce required sample sizes by 30–50% on metrics with strong pre-period predictors.

Multi-arm bandits and continuous optimisation

For high-velocity decisions where the cost of exploration is bounded, multi-armed bandits replace discrete A/B tests. The Thompson sampling algorithm — sample from each arm’s posterior reward distribution and play the arm with the highest sampled value — converges to the optimal arm faster than \(\varepsilon\)-greedy or upper-confidence-bound (UCB) variants under most realistic conditions.

The trade-off: bandits optimise short-term outcomes at the expense of long-term inferential cleanliness. A firm running pure bandits cannot easily answer “what was the causal effect of this feature?” — only “which arm performed best?” Most mature firms use bandits for tactical optimisation and reserve A/B tests for strategic decisions where causal inference matters.

The interference problem

The standard A/B test framework assumes the stable unit treatment value assumption (SUTVA): the outcome for unit \(i\) depends only on \(i\)’s treatment, not on others’. This assumption fails in marketplaces (Airbnb, Uber, eBay, Amazon Marketplace) where treatments interact through shared inventory, prices, or attention.

Modern marketplace experimentation uses:

  • Switchback designs: treating an entire market for a time window, then control for the next window, alternating.
  • Cluster randomisation: randomising entire cities or markets rather than users.
  • Matched-pair designs: pairing similar units and randomising one to treatment, the other to control.

A graduate student designing experiments in a marketplace context must ask, before the experiment is fielded: what is the interference structure of this experiment, and how will I estimate the average treatment effect under that structure?

Why experimentation is the AI factory’s binding capability

Most firms can buy data infrastructure, models, and cloud compute. Experimentation discipline cannot be bought; it is built through sustained investment in tooling, statistical literacy, and decision-making culture. The discipline is what distinguishes a firm that can learn from data from a firm that can merely process data. Without rigorous experimentation, predictions become opinions, and the virtuous cycle in §Chapter 3, §3.9 cannot turn.

Software infrastructure — the pipes

The fourth component is the operational substrate. MLOps — the discipline of deploying, monitoring, and updating ML models in production — has matured rapidly since 2020.

The MLOps stack

A mature MLOps stack has eight layers:

  1. Experiment tracking (MLflow, Weights & Biases, Comet): logging hyperparameters, metrics, artefacts, and model versions for reproducibility.
  2. Model registry (MLflow, SageMaker Model Registry, Vertex AI Model Registry): versioned, stage-gated repository of trained models with provenance metadata.
  3. Feature store (Feast, Tecton, Databricks Feature Store): consistent feature serving across training and inference; resolves the train-serve skew problem.
  4. Orchestration (Airflow, Dagster, Prefect, Kubeflow Pipelines): scheduling and DAG-based orchestration of training and batch-inference jobs.
  5. Serving (KServe, NVIDIA Triton, BentoML, SageMaker Endpoints, Vertex AI Predictions): high-throughput, low-latency model inference.
  6. Monitoring (Arize, Fiddler, Evidently): drift detection, performance degradation, data quality issues, latency.
  7. CI/CD for ML: automated retraining, validation, and deployment pipelines triggered by data or code changes.
  8. Governance: model cards, data sheets, fairness audits, bias monitoring — the regulatory and ethical layer that has become operationally central since the EU AI Act took force in August 2024.

Cloud, multi-cloud, and sovereign

The pure-public-cloud thesis of the 2010s has softened. The 2024–2026 reality is hybrid:

  • Public-cloud-native for new builds and experimental workloads.
  • On-premise GPU clusters for high-throughput inference where hyperscaler GPU prices are uneconomic, and for training large foundation models.
  • Sovereign cloud (regional, regulated) for sensitive workloads that cannot leave a jurisdictional perimeter.

The DBS figure of 99% workload migration to cloud is a useful directional benchmark — most large firms today are at 30–60%. Firms in regulated jurisdictions (Singapore, EU, increasingly Malaysia and Indonesia) are converging on a multi-cloud + on-premise pattern rather than a single-cloud commitment.

Agent infrastructure

The 2024–2026 evolution adds agent infrastructure — orchestration layers, tool registries, and observability surfaces specifically for agentic systems. The architectural pattern adds three components on top of the classical four-component factory:

  • Agent orchestration layer: managing agent lifecycles, allocating tasks, mediating between agents, recovering from agent-level failure.
  • Tool registry and access governance: controlled exposure of corporate APIs, data sources, and actions to agents, with provenance and audit trails.
  • Observability and intervention: end-to-end tracing of agent behaviour with human-in-the-loop intervention surfaces.

We develop these in Chapter 13.

The factory’s outputs

The factory’s output is consistently three things: predictions (what will happen, what does this contain, what should we recommend), pattern recognition (anomalies, segments, novel events), and process automation (acting on those predictions without a human in the loop). These map onto Davenport and Ronanki’s three buckets (Davenport and Ronanki, 2018) discussed in Chapter 17.

The virtuous cycle, formalised

Once an AI factory is operating, it produces a self-reinforcing loop that is the central source of competitive advantage in the age of AI.

flowchart LR
    U[Users interact] -->|generate| D[Data collected]
    D -->|train| F[AI factory]
    F -->|produce| P[Better predictions]
    P -->|improve| UX[Improved UX]
    UX -->|attract| MU[More users]
    MU -->|generate more| D
Figure 4.2: The AI factory’s virtuous cycle.

A simple model

Let \(u_t\) denote user volume in period \(t\), \(q_t\) denote service quality (a one-dimensional summary of recommendation accuracy, fraud-detection rate, etc.), and \(d_t\) denote cumulative data. Suppose:

\[ d_{t+1} = d_t + u_t, \qquad q_{t+1} = q(d_{t+1}), \qquad u_{t+1} = u(q_{t+1}, P_t), \]

where \(q(\cdot)\) is increasing in data and \(u(\cdot)\) is increasing in quality (and decreasing in price \(P_t\)). Under mild monotonicity assumptions, \(u_t\) is non-decreasing in \(t\) for any starting condition with positive \(u_0\), and grows whenever quality improvements exceed competitive responses. The system converges to a steady state only if \(q(\cdot)\) exhibits diminishing returns to data — empirically, the Kaplan et al. (2020) scaling laws suggest the data-quality function is logarithmic, so steady-state is reached only at very large data scale.

Why this generates competitive advantage

The virtuous cycle is the formal mechanism behind Iansiti and Lakhani’s claim that AI factories produce increasing returns. Three properties of the cycle make it operationally durable:

  1. First-mover entrenchment. A firm that crosses the threshold first accumulates a data lead that competitors cannot easily close, because the cycle generates data faster than competitors can buy it.
  2. Cross-product spillovers. The same factory infrastructure that learns from product A can learn from product B, so each new product line generates positive externalities for existing ones.
  3. Endogenous quality. Quality improves automatically over time without proportional investment, because the cycle generates the data needed for improvement as a byproduct of operations.

Once a firm has crossed the threshold where this loop is self-reinforcing, it has constructed a barrier to entry that is fundamentally different from the barriers studied in the pre-digital strategy literature (Bain, 1956; Porter, 1985). It is closer to a data-and-learning moat: latecomers face a permanently improving competitor with no obvious way to catch up except by recombining digital capabilities from elsewhere.

Reference deployment 1 — Netflix

Netflix’s recommendation system is the longest-running commercial AI factory. Its current architecture combines collaborative filtering, content-based filtering, deep neural networks, and continuous A/B testing across virtually every UI element.

The recommendation system

Gomez-Uribe and Hunt (2015) documents that approximately 80% of viewing originates from algorithmic recommendation rather than search. The architecture is layered:

  • Personalised video ranker orders the videos within each row of the user’s home page.
  • Top-N video ranker selects the rows shown to the user.
  • Continue watching identifies sessions to resume.
  • Trending now balances global trend signal with personal taste signal.
  • Video-video similarity powers the “more like this” surface.
  • Page generation assembles the rows for the user’s home page.

Each of these is a separate ML model. They share a feature store (viewing histories, session traces, dwell times, completion rates, device telemetry, content metadata) but are trained, deployed, and evaluated independently.

The experimentation platform

The factory’s experimentation platform is the most distinctive component. Netflix runs hundreds of simultaneous A/B tests at any time. The most-discussed example is artwork personalisation: Netflix tests not only thumbnails but the headline image personalised per user. A user whose viewing history is romance-heavy might see a romantic-couple thumbnail for Good Will Hunting; a user whose history is comedy-heavy might see a Robin-Williams comedy thumbnail for the same film.

The economic significance: each A/B test captures a small per-test improvement in completion rate, but the cumulative effect over thousands of tests per year produces durably superior engagement metrics. The platform is, in effect, the firm’s primary mechanism of innovation.

Content investment economics

Netflix’s recommendation system also shapes content investment. The company knows, with high confidence, what categories under-supply demand at the country-week level; it can commission content to fill specific gaps; and it can predict, before commissioning, the expected number of viewing hours for a candidate title. This data-driven content investment — combined with the recommendation system that delivers the content — is a closed-loop AI factory that has no precise analogue in traditional studios.

Why the factory matters more than any one model

The factory is more durable than any specific model. When Netflix moved from matrix-factorisation-based collaborative filtering to deep-learning-based personalisation in the late 2010s, the experimentation platform absorbed the change without disruption: the new model was tested against the old, found superior, and rolled out. The factory’s architecture made this transition routine. Firms that lack the factory have no equivalent transition mechanism — they must commit to a model architecture for years at a time and discover its limitations through customer attrition.

Reference deployment 2 — Ant Group

Ant Group’s 3-1-0 lending model — three minutes to apply, one second to approve, zero human intervention — exemplifies the factory pattern in financial services. The case is canonical because it illustrates the operational economics of a software-mediated runtime in a regulated industry.

The factory’s data pipeline

The factory ingests transaction data from Alipay’s payment graph, social and merchant data from Alibaba’s commerce platform, and behavioural signals from the app itself. Specific data sources:

  • Transaction history: every Alipay transaction, with merchant, amount, location, time. The graph is dense — a typical urban Chinese consumer has thousands of Alipay transactions per year.
  • Commerce behaviour: Tmall and Taobao purchase history, returns, ratings, vendor relationships.
  • Social and trust signals: connections in the Alipay social graph, payments to friends and family, group purchases.
  • Device and location telemetry: device fingerprint, geolocation, network type, login patterns.
  • Self-reported KYC data: identity verification, employment, salary range.

The integrated graph is the input to Ant’s credit-scoring algorithm, which produces a continuous score (Sesame Credit / Zhima Credit) for ~700M users.

The algorithms

Ant’s risk algorithms span:

  • Application scoring: probability of default given application data, before approval.
  • Behavioural scoring: ongoing risk assessment as account behaviour evolves.
  • Fraud detection: graph-neural-network models on the payment graph, looking for ring fraud, account takeover, and identity fraud.
  • Collection optimisation: predicting which delinquent accounts will self-cure and which require active collection.

Each model is trained on labelled outcomes (default, fraud, recovery) generated by the factory’s prior decisions, creating a feedback loop where each year’s data is partly the product of last year’s decisions.

The experimentation platform

Ant’s experimentation platform tests pricing, marketing messages, product features, and risk-tier thresholds simultaneously. The scale is comparable to Amazon’s Weblab — thousands of concurrent experiments at any time.

The headcount differential, revisited

Iansiti and Lakhani (2020)’s headline statistic — 10,000 employees serving 700M users — is a property of the factory, not of the bankers. ICBC, the world’s largest traditional bank by assets, employs roughly 425,000 people to serve a comparable customer base. The 40× employee differential is the AI factory’s signature.

The composition of Ant Group’s 10,000 employees is itself instructive. Approximately:

Function Headcount estimate
Engineering and data science ~5,000
Risk, compliance, and legal ~1,500
Product and design ~800
Operations (human review, exception handling) ~1,500
General and administrative ~1,200

Compare ICBC’s composition, which is dominated by branch staff (tellers, branch managers, regional managers) rather than engineers. The two firms have similar customer counts but fundamentally different organisational anatomies.

The regulatory failure mode

Ant Group also illustrates the regulatory failure mode of frictionless impact (Iansiti–Lakhani Rule 4 in Chapter 5). The Chinese regulator’s 2020 intervention — first the suspended IPO in November 2020, then forced restructuring in 2021–2023 — was the single largest regulatory action against a digital firm in modern financial history. The proximate causes were Ant’s micro-lending business growing faster than the regulator could supervise it, and Jack Ma’s October 2020 Bund Summit speech criticising regulators publicly. The structural cause was the factory’s frictionless impact: when an algorithmic credit decision can extend credit to 100 million new borrowers within months, regulators face an information and capacity asymmetry they cannot easily close.

The lesson is sobering: a firm that has built an AI factory has built operational power that may not be politically sustainable. We develop this point in Chapter 5, §5.5 (Iansiti-Lakhani Rule 5 on concentration and inequality).

Reference deployment 3 — Amazon

Amazon’s deployment is broader but architecturally similar. Iansiti and Lakhani open Competing in the Age of AI with Amazon’s digital operating model.

The four-component decomposition

  • Data pipeline: browsing histories, purchase behaviour, return patterns, search queries, ad impressions, third-party seller metadata, AWS billing telemetry, Prime Video viewing logs.
  • Algorithms: collaborative filtering, content-based ranking, dynamic pricing, demand forecasting, supply-chain optimisation, ad bidding, fraud detection, robotic-warehouse path planning.
  • Experimentation platform: Weblab — over 30,000 simultaneous experiments at any given time.
  • Infrastructure: AWS, which is itself sold to other firms as their AI factory infrastructure.

Cross-product spillovers

The Iansiti–Lakhani argument is that Amazon’s expansion across categories — books → general merchandise → groceries → cloud → advertising → media → pharmacy → healthcare — is enabled precisely because the AI factory transfers across categories. The capabilities are horizontal: the same pricing, demand forecasting, and recommendation infrastructure that runs Amazon Retail also runs AWS, Prime Video, and Alexa.

This is a particularly clean illustration of Arthur (1989)’s increasing returns to scope. Each new product line benefits from the existing factory, and contributes data and use cases that improve the factory for all other product lines. The marginal economics of expansion are dominated by the spillovers, not by the standalone economics of each new line.

The advertising business

Amazon’s advertising business is an under-discussed but particularly profitable expression of the factory. By 2025, Amazon’s ad revenue exceeded $50B annually, with operating margins higher than Google’s or Meta’s because the ad targeting uses purchase-intent signals rather than inferred interest. The same factory that runs product recommendation runs ad ranking; the marginal cost of adding ads to the existing recommendation surfaces is tiny.

The factory at small scale: the LISH counterexample

One implication of Iansiti and Lakhani (2020)’s argument is that any firm that builds an AI factory can capture digital scale, even without internet-era origins. The Laboratory for Innovation Science at Harvard (LISH) documented dozens of mid-market firms that built useful AI factories at low cost — typically combining cloud-hosted MLOps, open-source modelling tooling, and small in-house data teams. The lesson is that the architectural pattern matters more than the scale of the inputs; a 200-person logistics firm can and should run the same four-component factory that Ant Group does.

The 2024–2026 evolution makes this even more accessible. Hosted foundation models (Anthropic, OpenAI, Google), serverless inference (AWS Bedrock, Azure AI), and open-source modelling stacks (Hugging Face, LangChain, LlamaIndex) mean the factory’s algorithm and infrastructure components can be assembled in weeks rather than years. The bottleneck is now the data pipeline, the experimentation discipline, and the organisational complement.

A practical implication for SME advisers: the cost-effective starting point is not a custom foundation model, nor a hand-built MLOps stack, but a SaaS-mediated factory (data pipeline tools like Fivetran/Airbyte; modelling via foundation-model APIs; experimentation via vendor platforms like Statsig or LaunchDarkly) with in-house investment focused on the data products and the experimentation discipline.

Why the factory is the moat, not the model

A crucial implication of the framework — and one borne out by the 2024–2026 commoditisation evidence — is that the AI factory’s competitive advantage is not the algorithm itself.

Layer Commoditising? Source of advantage?
Foundation model Yes No
Cloud compute Yes No
MLOps tools Yes No
Data pipeline Partly Partly — proprietary feedback loops
Experimentation discipline No Yes
Workflow integration No Yes
Organisational complement No Yes

This insight reframes the post-DeepSeek strategic landscape. The right reading of January 2025 is not that Nvidia or OpenAI are doomed; it is that the locus of advantage moves up the stack from algorithms to operational architecture — exactly where Iansiti and Lakhani placed it five years earlier.

Connection to platform economics

The AI factory framework intersects with the platform-economics literature in important ways. Three connections deserve attention.

Two-sided markets

Rochet and Tirole (2003) (Rochet and Tirole, 2006) formalise platforms with two distinct user groups — buyers and sellers, riders and drivers, content creators and consumers — where the value of joining the platform on one side depends on participation on the other. The classic two-sided platform pricing problem: how to set \(p_b\) (price to buyers) and \(p_s\) (price to sellers) when the buyers’ utility depends on the number of sellers, and vice versa. The solution often involves subsidising one side (sometimes negatively pricing it) to seed the other.

Most modern AI factories run on top of two-sided platforms: Amazon Marketplace, Netflix’s content-distribution platform, Ant Group’s payment platform. The AI factory is the intelligence layer on the platform — it allocates attention, prices, and access decisions across both sides simultaneously. Where Rochet and Tirole (2003) model the two-sided pricing problem statically, the AI factory makes it dynamic, learning, and personalised.

Strategies for two-sided markets

Eisenmann, Parker, and Van Alstyne (2006) identify four common strategies for two-sided platforms:

  1. Subsidising one side: the canonical approach for cold-starting (e.g., Uber’s early subsidies to drivers).
  2. Envelopment: extending an established platform into adjacent two-sided markets (e.g., Microsoft bundling Office into Windows, or Amazon entering advertising via its retail traffic).
  3. Differential pricing: charging different prices on each side based on platform-elasticity differences.
  4. Multihoming control: making it easier or harder for users to use multiple platforms simultaneously (Apple’s tight integration; Visa/Mastercard’s open compatibility).

AI factories interact with each of these strategies. They make subsidies more efficient (target only the high-LTV new users), envelopment more credible (the factory’s capabilities transfer to the new domain), and differential pricing more surgical.

Network effects and data network effects

A standard network effect obtains when each user’s value rises with the number of other users. A data network effect obtains when each user’s experience improves as more data accumulates, separately from the number of users. Most AI factories generate data network effects: Netflix’s recommendations are better with more viewing data, regardless of how many users contributed it; Ant Group’s risk models are better with more loan-outcome data.

The data network effect is, in practice, often stronger than the conventional network effect, because data accumulates with usage even when user numbers are stable. A firm with stable user numbers but heavy per-user usage can outpace a firm with growing user numbers but light per-user usage.

Antitrust implications

The AI factory framework has uncomfortable antitrust implications that the original Iansiti–Lakhani treatment understates.

Khan’s Amazon’s Antitrust Paradox

Khan (2017) argues that the consumer-welfare standard of contemporary antitrust law — which evaluates firm conduct primarily by whether it raises consumer prices — is structurally unable to address platform-monopoly concerns. Platforms like Amazon can systematically lower prices to consumers (the welfare-positive surface) while simultaneously extracting rents from suppliers, third-party sellers, and labour (the welfare-negative depths).

The AI factory framework, read with Khan (2017) in mind, is partly a description of how this dual extraction is mechanically possible: the factory’s data collection extends across both the consumer and supplier sides of the platform, and the factory’s optimisation simultaneously prices to maximise the joint surplus the platform extracts from both sides. Conventional antitrust analysis, focused on consumer prices alone, sees only half the picture.

Wu’s curse of bigness

Wu (2018) argues that platform concentration has reached levels at which the political economy of bigness — not just narrow consumer-welfare effects — should reactivate the structural-antitrust tradition of Brandeis. The AI factory’s increasing returns make this more pressing: the concentration is endogenous to the technology, not a contingent result of market conduct.

The graduate-level implication is that we should read Iansiti and Lakhani not as celebrating the AI factory but as describing a structural condition that public policy will eventually have to address. The five-rule framework in Chapter 5 (and especially Rule 5 on concentration and inequality) acknowledges this; the regulatory response we develop in Chapter 14 is, in part, the public-policy reaction to the factory’s economics.

Operating model implications

Iansiti and Lakhani (2020)’s later chapters draw out the operating-model implications of the factory. Three are worth memorising:

  1. The firm shrinks in headcount per unit of output but grows in capability per employee. Ant Group’s 10,000 staff are a different mix from ICBC’s 425,000: more engineers, more data scientists, fewer transaction-processing clerks.
  2. The boundary of the firm moves. Activities previously performed inside the firm (kept as competitive advantage) are now exposed via APIs to ecosystem partners, and activities previously contracted out are pulled inside (the “30–70 in-source shift” covered in Chapter 4).
  3. The CEO’s role changes. In an AI-native firm, the CEO’s central job is the design and tuning of the factory itself — the architecture, the experimentation discipline, the talent mix — rather than the management of human-mediated operations.

The risks the framework does not foreground

Iansiti and Lakhani are clear-eyed about the risks but the framework can read as triumphalist. Four counter-points worth holding alongside the framework:

  • Regulatory and political risk. Ant Group’s restructuring is the canonical reminder. Frictionless impact at scale invites political reaction.
  • Concentration and inequality. The framework predicts concentration — and Iansiti and Lakhani themselves acknowledge this in their Rule 5 (Chapter 5). Whether concentration is socially optimal is a question the framework does not answer.
  • Quality and reliability ceilings. Klarna’s reversal (Chapter 8) shows that operational scale alone does not guarantee quality outcomes at the customer interface.
  • Resilience and systemic risk. When operational critical paths run on software, software failures become operational catastrophes. The 2024 CrowdStrike outage that took down 8.5 million Windows machines globally is the canonical illustration; the cost was estimated at $5.4B in damages.

Exercises 3.1

  1. Decomposing a factory. Pick a firm in your country and decompose its operations into the four AI factory components. (a) Score each component on a 0–5 scale of maturity. (b) Identify the binding constraint. (c) Estimate the cost and timeline to bring the binding constraint to a 4 or 5.

  2. The virtuous-cycle model. Using the simple model in §Chapter 3, §3.9, (a) derive the steady-state user volume \(u^*\) assuming \(q(d) = \log(1+d)\), \(u(q, P) = \alpha q - \beta P\), and constant price \(\bar{P}\). (b) Compute the comparative statics with respect to \(\bar{P}\). (c) Discuss what the steady-state implies about the durability of competitive advantage.

  3. The data-network effect. Identify (a) one industry where the data-network effect is structurally weak (i.e., extra data adds little value) and (b) one where it is structurally strong. What technical and organisational features distinguish them?

  4. RAG specification. Specify a RAG system for a regional bank’s internal-knowledge use case (employees retrieving policy and product information). Address all six layers identified in §Chapter 3, §3.4: chunking, embedding, indexing, permission preservation, freshness, quality monitoring.

  5. Experimentation maturity audit. Construct a 10-question audit that distinguishes a firm with mature experimentation discipline from one without. Test your audit against three firms: a hyperscaler (Amazon, Google, Meta), a retail bank in your country, and a regional retailer.

  6. Interference design. Design an A/B test for a feature change in a two-sided marketplace where treatment effects might leak across users. (a) Identify the interference structure. (b) Choose an appropriate experimental design. (c) Specify the estimator and its statistical properties.

  7. The Ant Group restructuring. Read the regulatory documents around Ant Group’s 2020–2023 restructuring. (a) What specific aspects of Ant’s factory triggered regulatory concern? (b) Could the factory have been redesigned to address the concerns without the structural restructuring? (c) Generalise to what regulators in other jurisdictions should watch for.

  8. The Netflix moat. Gomez-Uribe and Hunt (2015) reports that 80% of Netflix viewing originates from algorithmic recommendation. This figure is now over a decade old. (a) Estimate the current figure (2026) and identify how you would measure it. (b) What does the change in the figure (if any) tell you about the durability of the recommendation moat?

  9. The platform economics connection. Apply Rochet and Tirole (2003)’s two-sided pricing framework to a platform you know well (e.g., Grab, Gojek, Carousell, Shopee, Uber). (a) Identify the two sides. (b) Identify the cross-side network effects. (c) Speculate on optimal pricing given the elasticity differences.

  10. The Khan critique. Khan (2017) argues that consumer-welfare-standard antitrust cannot address platform-monopoly concerns. (a) Defend Khan’s argument using the AI factory framework. (b) Identify the strongest counter-argument. (c) What would a “platform-aware” antitrust standard look like operationally?

  11. The LISH counterexample. Construct an end-to-end AI factory specification for a 200-person SME in your country (manufacturing, logistics, retail — your choice). Specify the full SaaS stack, in-house headcount, year-1 data products, year-1 experimentation programme, and total budget.

  12. The fifth component. The classical factory has four components. The 2024–2026 agentic era arguably requires a fifth (orchestration / observability / tool registry). Construct the case for treating these as a fifth distinct component versus an extension of the existing four.

Further reading

Iansiti and Lakhani (2020) is the indispensable text; read Chapters 1–5 carefully. For the platform-economics literature, Rochet and Tirole (2003) and Rochet and Tirole (2006) are foundational; Eisenmann, Parker, and Van Alstyne (2006) is the readable HBR application. For increasing returns, Arthur (1989) is essential. For the antitrust critique, Khan (2017) and Wu (2018) are the most-cited contemporary works. For the technical depth on MLOps, the Designing Machine Learning Systems textbook by Chip Huyen (O’Reilly, 2022) is the standard reference; for experimentation, Kohavi-Tang-Xu’s Trustworthy Online Controlled Experiments (Cambridge, 2020) is the practitioner bible. For Netflix specifically, the Gomez-Uribe and Hunt (2015) ACM TMIS paper is the public-record reference; the company’s Tech Blog has been an indispensable source on the recommendation system’s evolution.

References for this chapter

  • [Reference for sec not in bibliography]
  • Khan, L. M. (2017). Amazon’s antitrust paradox. Yale Law Journal 126(3): 710–805.
  • Iansiti, M. and Lakhani, K. R. (2020). Competing in the Age of AI: Strategy and Leadership When Algorithms and Networks Run the World. Harvard Business Review Press.
  • DeepSeek-AI (2025). DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning. arXiv:2501.12948.
  • Davenport, T. H. and Ronanki, R. (2018). Artificial intelligence for the real world. Harvard Business Review 96(1): 108–116.
  • Kaplan, J., McCandlish, S., Henighan, T., et al. (2020). Scaling laws for neural language models. arXiv:2001.08361.
  • Bain, J. S. (1956). Barriers to New Competition. Harvard University Press.
  • Porter, M. E. (1985). Competitive Advantage: Creating and Sustaining Superior Performance. Free Press.
  • Gomez-Uribe, C. A. and Hunt, N. (2015). The Netflix recommender system. ACM TMIS 6(4): 1–19.
  • Arthur, W. B. (1989). Competing technologies, increasing returns, and lock-in by historical events. Economic Journal 99(394): 116–131.
  • Rochet, J.-C. and Tirole, J. (2003). Platform competition in two-sided markets. Journal of the European Economic Association 1(4): 990–1029.
  • Rochet, J.-C. and Tirole, J. (2006). Two-sided markets: A progress report. RAND Journal of Economics 37(3): 645–667.
  • Eisenmann, T., Parker, G., and Van Alstyne, M. W. (2006). Strategies for two-sided markets. Harvard Business Review 84(10): 92–101.
  • Wu, T. (2018). The Curse of Bigness: Antitrust in the New Gilded Age. Columbia Global Reports.