PAPER 26 · Training practice

The Smol Training Playbook

Hugging Face 2025 Playbook / blog

A practical handbook for efficiently training smaller language models.

Core concept

Training small models well is mostly about sharp data, sharp evals, and a sharply defined job.

Why it mattered

It is the practical handbook for builders who cannot or should not train frontier-scale models.

Visual shortcut · Small model craft loop
1. curate data 2. pick tokenizer + recipe 3. evaluate early/often craft over brute force

Small-model training is a craft loop: narrow task, clean data, honest eval, repeat.

How it works
Pick a clear workflow.
Build or curate the dataset.
Train a compact model.
Measure against a real eval before scaling.

The quick digest

The Smol Training Playbook is less a single scientific claim and more a craft manual. It says small models can be useful if you are disciplined about data selection, deduplication, tokenizer choices, training mix, evaluation, and iteration.

The key difference from giant models is margin for error. A huge model can sometimes absorb messy data and broad objectives. A small model needs a clearer job and cleaner examples because it has less capacity to waste.

For local AI, this is the everyday playbook: define the workflow, build the eval, curate the data, train small, measure honestly, and only scale when the bottleneck is real.

What to remember

One-liner
Small models need clear jobs.
Why it matters
Clean data and evals matter more than heroics.
Builder instinct
Local training is a craft loop.

Read it like this

Build instinct

Pick one local workflow, define evals, create a small dataset, and run a tiny training loop with a before/after score.

Read source → All papers
Previous25 · GLaM: Efficient Scaling of Language Models with Mixture-of-Experts