PAPER 10 · Grounding

Retrieval-Augmented Generation

Lewis et al. 2020 Paper

The foundational RAG paper: combine parametric model knowledge with external retrieved documents.

Core concept

RAG lets a model look things up before answering instead of relying only on what it memorized during training.

Why it mattered

It is the foundation for useful private-data AI: current, auditable knowledge can live outside the model.

Visual shortcut · Model plus library
question
retriever
grounded answer
external memory

RAG gives the model a library card: retrieve the right context first, then answer with that context in view.

How it works
Receive a question.
Search an external collection.
Put relevant passages into the prompt.
Generate an answer grounded in those passages.

The quick digest

Imagine two kinds of memory. One is baked into the model’s weights: broad language ability, facts seen during training, general patterns. The other is an external library: documents, notes, policies, contracts, tickets, databases. RAG connects the two.

The system first retrieves relevant passages, then gives those passages to the generator as context. The model is still doing language work, but the facts can come from a source you control and update. That is why RAG became the default enterprise pattern.

The paper’s enduring lesson is not “add a vector database.” It is that knowledge systems can be hybrid. Some things belong in the model; some things should be fetched at answer time because they change, need citations, or are too private to train into weights.

What to remember

One-liner
The model gets a library card.
Why it matters
Private and changing knowledge belongs outside the weights.
Builder instinct
Retrieval quality is usually the real bottleneck.

Read it like this

Build instinct

Build a small document Q&A system and log misses by failure type: retrieval miss, bad chunk, bad synthesis, no answer.

Read source → All papers
Previous09 · FlashAttention