WEEK 03 OF 12 · FINE-TUNING DEEP DIVE

You can run any model. This week you make one yours.

Week 1 was what the hardware is. Week 2 was serving it to other people. Week 3 is the first week the model itself changes shape under your hands. By Sunday you'll have taken a base open model and fine-tuned it on your own data with QLoRA — a real estate analyst that writes in one consistent voice and reasons about housing the way you taught it, trained start to finish on the box on your desk.

The arc: Wk 2 · Inference at scale → Wk 3 · Fine-tuning → Wk 4 · Quantization & FP4

Why this week matters

Fine-tuning is the difference between renting a model's general intelligence and owning a model's behavior. A prompt is a costume you put on every request. A fine-tune is a haircut — it changes the model so it shows up the way you want by default, with no instructions, no examples, and a fraction of the tokens. On a DGX Spark, a real LoRA fine-tune of Llama 3.1 8B runs at ~53,000 tokens/sec and finishes in a coffee break. That capability used to cost a cluster.

The week, at a glance

Same promise as every week: plain-English mental model first, then the one habit, number, or experiment that makes it real on your own DGX. Six days of building intuition, then one capstone where you ship a model you trained yourself.

DAY 01 · MON

Prompt, RAG, or Fine-Tune?

The decision that comes before any training. What fine-tuning actually changes, what it can't fix, and the three questions that tell you which tool the job needs.

Lesson live

DAY 02 · TUE

LoRA — The Low-Rank Trick

Why you don't retrain 8 billion weights to teach a model one new behavior. Freeze the giant, train two tiny matrices, and get 99% of the result for 1% of the cost.

Lesson live

DAY 03 · WED ⭐

QLoRA — Fine-Tuning on One Desk

The 2023 breakthrough that put training on consumer hardware. Squeeze the frozen model to 4 bits, train the adapter in full precision, lose almost nothing. The trick that makes today possible.

Lesson live

DAY 04 · THU

Data Is the Real Product

The model is a commodity; your dataset is the moat. Formatting, chat templates, the 100-example rule, and why most failed fine-tunes are data problems wearing a training-code mask.

Lesson live

DAY 05 · FRI

SFT vs DPO — Teaching Taste

Supervised fine-tuning teaches the model what to say. Preference tuning teaches it which of two answers is better. How DPO replaced the scary RLHF pipeline with a clean classification loss.

Lesson live

DAY 06 · SAT

Did It Actually Work?

The part everyone skips. Overfitting, catastrophic forgetting, and the eval discipline that separates "the loss went down" from "the model got better." How to know before you ship.

Lesson live

DAY 07 · SUN ⭐

Capstone — Real Estate Analyst v2

Take the Week 1 analyst from prompted to trained. QLoRA fine-tune Llama 3.1 8B on a real estate dataset, give it a house voice, merge the adapter, and serve it on your DGX. A model that's yours.

Lesson live

What you'll be able to answer by Sunday night

When fine-tuning is the right move and when a prompt or RAG would have done the same job for free.
What LoRA actually does to a model, explained simply enough to say it out loud at dinner.
Why QLoRA's 4-bit trick lets a $4,700 box do what needed a $400,000 cluster two years ago.
Why your dataset matters ten times more than your hyperparameters — and what a good training example looks like.
The difference between supervised fine-tuning and DPO, and when you'd reach for each.
How to tell whether your fine-tune got smarter or just memorized — before you let anyone use it.

What you need before Day 1

Carryover from Weeks 1–2

DGX Spark on your network, Ollama installed, comfortable SSH-ing in and running Python.
The serving intuition from Week 2 — Day 7's capstone will serve the model you train.

New this week

~100 GB free disk for the base model, your dataset, and checkpoints.
Unsloth installed (we'll do it Day 3) — it gives the Spark a ~2.5× training speed-up with custom Triton kernels.
One small dataset you care about. We'll supply a real estate one, but the lesson lands harder with data that's yours.

The big idea

Most people will only ever use models other companies trained. After this week you're on the other side of that line: you can take a base model and bend its behavior to a job, on hardware you own, with data nobody else has. That's the whole game.

← Replay Wk 2 Back to home