PAPER 24 · Large-scale training

PaLM: Scaling Language Modeling with Pathways

Chowdhery et al. 2022 Paper

A masterclass in large-scale language-model training across thousands of accelerators.

Core concept

PaLM is a case study in what it takes to train a giant dense language model across massive accelerator infrastructure.

Why it mattered

It shows that frontier models are systems operations as much as algorithms.

Visual shortcut · Frontier training machine

PaLM is a reminder that frontier models are infrastructure projects as much as model definitions.

How it works

Assemble massive data and compute.

Distribute training across accelerators.

Keep the run stable.

Evaluate broad emergent capabilities.

The quick digest

PaLM scales a decoder-only Transformer across Google’s Pathways infrastructure. The paper reports broad gains across language, reasoning, code, and multilingual tasks, but the deeper story is the training operation behind it.

At this scale, the model is no longer just a neural net. It is data pipelines, distributed systems, accelerator scheduling, stability engineering, checkpointing, monitoring, and evaluation. The architecture matters, but execution discipline matters just as much.

Read PaLM to understand the industrial side of frontier AI: making thousands of chips behave like one training machine is itself a major part of the research.

What to remember

One-liner

Frontier models are infrastructure projects.

Why it matters

Training at scale is operations plus research.

Builder instinct

The system around the model is part of the model story.

Read it like this

First pass: Look at scale and infrastructure first.
Second pass: Then skim capability results by domain.
Then build taste: Compare with open models that later brought pieces of this capability local.

Build instinct

Write a one-page training ops checklist: data, compute, monitoring, checkpointing, evals, and failure recovery.

Read source → All papers