PAPER 04 · In-context learning

Language Models are Few-Shot Learners

Brown et al. 2020 Paper

The GPT-3 paper that made in-context learning impossible to ignore.

Core concept

GPT-3 showed that a big enough language model can learn a new task from examples placed directly in the prompt.

Why it mattered

It turned prompting into a product interface and made “in-context learning” impossible to ignore.

Visual shortcut · Prompt as tiny curriculum

GPT-3 made the prompt feel like a temporary training set: a few examples in context can define the job.

How it works

Describe the task in text.

Show a few input/output examples.

Let the model infer the pattern from context.

Generate the next answer without weight updates.

The quick digest

The old assumption was: if you want a model to do a task, fine-tune it on task data. GPT-3 challenged that. Give the model instructions and a few examples inside the context window, and it often figures out the pattern without changing its weights.

That is the deep idea behind few-shot learning: the prompt becomes a temporary training set. The model has absorbed so many patterns during pretraining that it can infer what game you are playing from a handful of demonstrations.

The paper is also a scale argument. Not every behavior improves smoothly, but as the model gets larger, new capabilities become easier to elicit with prompts. This is where “prompt engineering” starts to look less like wording tricks and more like programming a probabilistic system with examples.

What to remember

One-liner

A prompt can teach a temporary task.

Why it matters

Examples in context can replace some training data.

Builder instinct

This is where prompting becomes a real interface.

Read it like this

First pass: Read the few-shot examples before the scaling charts.
Second pass: Notice the difference between zero-shot, one-shot, and few-shot behavior.
Then build taste: Ask which tasks still need tools, retrieval, or fine-tuning.

Build instinct

Create a 10-example prompt for a narrow extraction task, then measure how much each example changes accuracy.

Read source → All papers