PAPER 24 · Large-scale training

PaLM: Scaling Language Modeling with Pathways

Chowdhery et al. 2022 Paper

A masterclass in large-scale language-model training across thousands of accelerators.

Core concept

PaLM is a case study in what it takes to train a giant dense language model across massive accelerator infrastructure.

Why it mattered

It shows that frontier models are systems operations as much as algorithms.

Visual shortcut · Frontier training machine
TPU pod TPU pod TPU pod
single giant dense model
frontier scale as infrastructure story

PaLM is a reminder that frontier models are infrastructure projects as much as model definitions.

How it works
Assemble massive data and compute.
Distribute training across accelerators.
Keep the run stable.
Evaluate broad emergent capabilities.

The quick digest

PaLM scales a decoder-only Transformer across Google’s Pathways infrastructure. The paper reports broad gains across language, reasoning, code, and multilingual tasks, but the deeper story is the training operation behind it.

At this scale, the model is no longer just a neural net. It is data pipelines, distributed systems, accelerator scheduling, stability engineering, checkpointing, monitoring, and evaluation. The architecture matters, but execution discipline matters just as much.

Read PaLM to understand the industrial side of frontier AI: making thousands of chips behave like one training machine is itself a major part of the research.

What to remember

One-liner
Frontier models are infrastructure projects.
Why it matters
Training at scale is operations plus research.
Builder instinct
The system around the model is part of the model story.

Read it like this

Build instinct

Write a one-page training ops checklist: data, compute, monitoring, checkpointing, evals, and failure recovery.

Read source → All papers
Previous23 · Scaling Monosemanticity