PAPER 07 · Open-weight era

LLaMA: Open and Efficient Foundation Language Models

Touvron et al. 2023 Paper

The paper that kicked open the open-weight era and standardized many practical architecture defaults.

Core concept

LLaMA proved that carefully trained open-weight models at practical sizes could be extremely capable.

Why it mattered

It kicked off the modern local/open LLM ecosystem: quantization, fine-tunes, local inference, and community model work.

Visual shortcut · Open weights unlock builders

LLaMA matters because strong foundation models became artifacts people could run, tune, quantize, and remix.

How it works

Train efficient decoder models well.

Release useful sizes.

Let researchers and builders run them locally.

Watch an ecosystem of fine-tunes and tools form around them.

The quick digest

LLaMA is not famous because it invented one flashy new layer. It is famous because Meta trained a family of efficient decoder-only models very well, at sizes researchers and builders could actually use.

The recipe matters: lots of tokens, efficient architecture defaults, and model sizes that fit real hardware better than giant closed models. Once the weights escaped into the world, builders could run, inspect, quantize, fine-tune, and remix them.

The paper marks a cultural shift. Foundation models stopped being only remote API products and became local artifacts. For LocalsOnly-style work, this is one of the key moments where “run your own model” became practical and socially contagious.

What to remember

One-liner

Open weights changed who could build.

Why it matters

Training discipline beat flashy novelty.

Builder instinct

Local AI became culturally real after LLaMA.

Read it like this

First pass: Look at the training data and token budget first.
Second pass: Then inspect architecture choices and model sizes.
Then build taste: Read benchmark tables as evidence of efficiency, not as eternal rankings.

Build instinct

Run a small LLaMA-family model locally, then compare latency and quality before and after quantization.

Read source → All papers