PAPER 12 · Preference alignment

Direct Preference Optimization

Rafailov et al. 2023 Paper

A simpler, stable alternative to PPO-style RLHF that optimizes preferences directly through the loss.

Core concept

DPO lets you tune a model from preferred/rejected answer pairs without running a complicated reinforcement learning loop.

Why it mattered

It made preference tuning simpler and more accessible for smaller teams.

Visual shortcut · Preference tuning without the big RL machine

DPO compresses preference tuning into a direct lesson: make the preferred answer more likely than the rejected one.

How it works

Collect chosen/rejected response pairs.

Compare model probabilities for both answers.

Push the model toward the chosen one.

Keep it anchored to a reference model.

The quick digest

RLHF is powerful but operationally messy: train a reward model, run reinforcement learning, keep the model from drifting, debug instability. DPO asks whether you can get much of the same preference-shaping effect directly from pairs of good and bad answers.

The answer is yes. Given a prompt, a preferred response, and a rejected response, DPO adjusts the model so the preferred response becomes more likely relative to the rejected one while staying anchored to a reference model.

For builders, the paper’s practical meaning is huge: if you can collect clean preference pairs, you can steer style, helpfulness, refusal patterns, and domain behavior without building a full RLHF machine.

What to remember

One-liner

Preference pairs can directly steer a model.

Why it matters

Simpler alignment loops made tuning more accessible.

Builder instinct

The dataset becomes the steering wheel.

Read it like this

First pass: Read the objective intuitively before the derivation.
Second pass: Compare it with the InstructGPT RLHF pipeline.
Then build taste: Look for where the reference model prevents drift.

Build instinct

Create chosen/rejected pairs for one writing style and run a small DPO fine-tune or mock ranking eval.

Read source → All papers