PAPER 07 · Open-weight era

LLaMA: Open and Efficient Foundation Language Models

Touvron et al. 2023 Paper

The paper that kicked open the open-weight era and standardized many practical architecture defaults.

Core concept

LLaMA proved that carefully trained open-weight models at practical sizes could be extremely capable.

Why it mattered

It kicked off the modern local/open LLM ecosystem: quantization, fine-tunes, local inference, and community model work.

Visual shortcut · Open weights unlock builders
closed frontier
open weights
efficient training
builder ecosystem

LLaMA matters because strong foundation models became artifacts people could run, tune, quantize, and remix.

How it works
Train efficient decoder models well.
Release useful sizes.
Let researchers and builders run them locally.
Watch an ecosystem of fine-tunes and tools form around them.

The quick digest

LLaMA is not famous because it invented one flashy new layer. It is famous because Meta trained a family of efficient decoder-only models very well, at sizes researchers and builders could actually use.

The recipe matters: lots of tokens, efficient architecture defaults, and model sizes that fit real hardware better than giant closed models. Once the weights escaped into the world, builders could run, inspect, quantize, fine-tune, and remix them.

The paper marks a cultural shift. Foundation models stopped being only remote API products and became local artifacts. For LocalsOnly-style work, this is one of the key moments where “run your own model” became practical and socially contagious.

What to remember

One-liner
Open weights changed who could build.
Why it matters
Training discipline beat flashy novelty.
Builder instinct
Local AI became culturally real after LLaMA.

Read it like this

Build instinct

Run a small LLaMA-family model locally, then compare latency and quality before and after quantization.

Read source → All papers
Previous06 · Training Compute-Optimal Large Language Models