Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Showed that reasoning behavior can be elicited through examples and intermediate reasoning steps.
Models often reason better when the prompt shows worked steps instead of only final answers.
It revealed that prompt format can unlock latent reasoning behavior at inference time.
Chain-of-thought prompting works because examples teach the model to decompose instead of guess.
The quick digest
This paper is about a deceptively simple change: when giving examples to a large model, include the intermediate reasoning, not just the answer. For arithmetic, commonsense, and symbolic tasks, that can materially improve performance.
The nontechnical intuition: the model has seen many explanations and solution paths during training. If your prompt demonstrates that the task should be solved by decomposing it, the model is more likely to follow that pattern instead of jumping to an answer.
This becomes the root of a major line of reasoning work. Later systems hide the scratchpad, train reasoning behavior directly, or spend more inference compute, but the seed is here: the shape of the answer can change the quality of thinking.
What to remember
Read it like this
- First pass: Study the example prompts.
- Second pass: Compare tasks where CoT helps versus simple tasks where it does not.
- Then build taste: Connect this to later reasoning models and hidden scratchpads.
Run the same math and planning tasks with direct answers, short reasoning, and structured step-by-step prompts.