Fine-tuning is a hammer. First decide if you have a nail.
Before you touch a single training script, you make a decision most people skip: should this even be a fine-tune? Half the failed fine-tunes in the world were a prompt change or a search index in disguise. Today is the map that keeps you from training a model to solve a problem training can't solve.
Fine-tuning changes how a model behaves — its tone, format, instincts, and default reasoning. It does not reliably teach the model new facts, and it can't give it information that didn't exist at training time. Pick the wrong tool and you'll spend a week teaching a model something a one-line prompt already did.
Training is the expensive, slow, hard-to-undo option. The discipline of asking "could a prompt or a document lookup do this?" first is what separates people who ship from people who burn a weekend fine-tuning their way to a worse model.
The three tools, in plain English
You have three ways to make a model do what you want, and they stack from cheapest to most expensive.
Prompting is giving instructions and examples in the message itself. It's free, instant, and reversible. You're not changing the model at all — you're dressing it up for one request. If a really good prompt with two or three examples gets you 90% of the way, you are very likely done.
RAG (retrieval-augmented generation) is letting the model look things up. You keep your knowledge in a searchable store, fetch the relevant pieces at question time, and paste them into the prompt. This is how you give a model facts it never saw — your company's docs, this week's listings, a private database. The model didn't learn anything; it got handed the right page.
Fine-tuning is changing the model's weights so a behavior becomes baked in. No instructions, no pasted examples — it just shows up the way you trained it. This is how you change style, format, and instinct: a consistent voice, reliable JSON, a domain's way of reasoning.
The picture
The question that decides it
Ask yourself: is the problem that the model doesn't know something, or that it doesn't act the way I want?
If it doesn't know something — yesterday's prices, your private files, a fact from after its training cutoff — no amount of fine-tuning fixes that reliably. That's a RAG problem. Stuffing facts into a fine-tune is how you get a model that confidently makes things up, because you taught it the shape of the answer without a dependable way to store the content.
If it does know the material but acts wrong — too verbose, wrong format, generic voice, ignores your house style — that's exactly what fine-tuning is for. You're not adding knowledge, you're correcting behavior.
The honest test for "do I need to fine-tune?"
- Did a strong prompt with 2–3 examples already get you most of the way? If yes, stop. Ship the prompt.
- Is the gap about facts the model can't have? That's RAG, not fine-tuning.
- Do you need the behavior to be consistent across thousands of calls without re-pasting instructions every time? Now fine-tuning earns its cost.
- Do you have at least ~100 good examples of the behavior you want? If not, you have a data project before you have a training project.
Where this week is going
We're going to fine-tune for real — because by Day 7 you'll want the real estate analyst to write in one consistent voice and reason about housing without a 600-word system prompt babysitting it every time. That's a behavior problem, which makes it a fine-tuning problem. But notice we earned that conclusion. We didn't start with "let's fine-tune" and look for a reason.
The non-technical version
Prompting is leaving a sticky note for a smart temp. RAG is handing that temp the right file folder before they start. Fine-tuning is hiring and training a permanent employee who already knows how your office does things. You don't put someone through a training program to answer one question — but you also don't leave a sticky note for something you'll need done the same way ten thousand times.
Vocabulary to keep
Take the job you think needs a fine-tune and write it at the top of a page. Underneath, force a real attempt at a prompt-only version first. If the prompt gets you 90%, you just saved a week. If it plateaus at "close but never consistent," you've found a genuine fine-tuning job — and you'll know exactly which behavior you're training.
# Before you train, answer these The job: ____________________ Is the gap KNOWLEDGE or BEHAVIOR? ____________________ Did a 3-example prompt get me 90%? yes / no Could RAG supply the missing facts? yes / no Do I have ~100 good examples? yes / no # If it's behavior, consistent at scale, and you have the data: # now you're allowed to fine-tune.
Fine-tuning changes who the model is. RAG changes what it just read. Most "we need to fine-tune" is really "we need a better prompt or a search box."