Language Models are Few-Shot Learners
The GPT-3 paper that made in-context learning impossible to ignore.
GPT-3 showed that a big enough language model can learn a new task from examples placed directly in the prompt.
It turned prompting into a product interface and made “in-context learning” impossible to ignore.
GPT-3 made the prompt feel like a temporary training set: a few examples in context can define the job.
The quick digest
The old assumption was: if you want a model to do a task, fine-tune it on task data. GPT-3 challenged that. Give the model instructions and a few examples inside the context window, and it often figures out the pattern without changing its weights.
That is the deep idea behind few-shot learning: the prompt becomes a temporary training set. The model has absorbed so many patterns during pretraining that it can infer what game you are playing from a handful of demonstrations.
The paper is also a scale argument. Not every behavior improves smoothly, but as the model gets larger, new capabilities become easier to elicit with prompts. This is where “prompt engineering” starts to look less like wording tricks and more like programming a probabilistic system with examples.
What to remember
Read it like this
- First pass: Read the few-shot examples before the scaling charts.
- Second pass: Notice the difference between zero-shot, one-shot, and few-shot behavior.
- Then build taste: Ask which tasks still need tools, retrieval, or fine-tuning.
Create a 10-example prompt for a narrow extraction task, then measure how much each example changes accuracy.