DAY 07 · CAPSTONE PROJECT

Build an AI Real Estate Analyst on your desk.

Today is the project. We'll grab a public dataset of ~2 million US real estate listings, fine-tune Llama 3.1 8B on it using LoRA, deploy the result with Ollama, and end the day with a private AI that drafts listings, prices homes, and reasons about US housing markets — running entirely on your DGX Spark. Plan for ~3-4 hours, most of which is the GPU doing its thing while you read.

~2M
listings in dataset
~10k
Q&A pairs we'll train on
~2 hr
LoRA training time
~$0.40
total electricity cost

What we're building, exactly

A LoRA fine-tune of Llama 3.1 8B Instruct, taught on synthesized Q&A pairs derived from the public USA Real Estate dataset on Kaggle (≈2.2M Realtor.com listings). The result will:

It will not be a perfect Zestimate replacement. The point is to feel the full loop — public data → curated training set → real fine-tune → deployed model — on your hardware. By the end, you'll know how every "AI specialist for X" startup is actually built.

Why this exercise matters

Every "AI for [vertical]" company you'll see in the next two years is doing roughly what we're doing today, with more polish: take a public or proprietary dataset, fine-tune an open model on it, ship it as the "expert AI." The moat for most vertical-AI products is the data + the fine-tune, not novel model research. CUDA is what makes this feasible on a budget.

Architecture, in one diagram

Kaggle dataset 2.2M listings · CSV Q&A builder Python script Unsloth + LoRA CUDA + FlashAttn ~2 hr GPU run Ollama (deployed) llama3.1:re-analyst FP4 quantized Open WebUI · chat
Data → curate → fine-tune → deploy → chat. The same five steps as every vertical-AI startup, miniaturized to a Sunday.

Lab

STEP 1 · ~10 MIN

Get the dataset from Kaggle

Sign in at kaggle.com (free), open your account → Settings → API → Create New Token. Save the resulting kaggle.json to your DGX as below.

$ ssh you@dgx-spark.local
$ mkdir -p ~/.kaggle && mv kaggle.json ~/.kaggle/ && chmod 600 ~/.kaggle/kaggle.json
$ pip install kaggle
$ cd ~/cuda-week && mkdir realestate && cd realestate
$ kaggle datasets download ahmedshahriarsakib/usa-real-estate-dataset
$ unzip usa-real-estate-dataset.zip

You'll have a CSV named realtor-data.zip.csv with columns like price, bed, bath, acre_lot, city, state, zip_code, house_size, status. About 2.2M rows.

STEP 2 · ~30 MIN

Build a Q&A training set

The model needs instruction-style data: a question/instruction and a desired answer. We'll synthesize ~10k Q&A pairs from the listings — a mix of "describe this property," "estimate price for X," and "what's the median Y in city Z."

~/cuda-week/realestate/build_dataset.py
import pandas as pd, json, random
from pathlib import Path

df = pd.read_csv("realtor-data.zip.csv")
df = df.dropna(subset=["price","bed","bath","city","state","house_size"])
df = df[(df.price.between(50_000, 5_000_000)) &
        (df.house_size.between(300, 10_000))]
df = df.sample(n=50_000, random_state=42).reset_index(drop=True)

def describe(r):
    return (f"Draft a real estate listing for a {int(r.bed)}-bed, {int(r.bath)}-bath, "
            f"{int(r.house_size)} sqft home in {r.city}, {r.state}, "
            f"on {r.acre_lot} acres, listed at ${int(r.price):,}.")

def describe_answer(r):
    # Templated; the model will learn to generalize the cadence.
    return (
      f"Welcome home. This {int(r.bed)}BR/{int(r.bath)}BA residence in "
      f"{r.city}, {r.state} offers {int(r.house_size):,} square feet of "
      f"living space on {r.acre_lot} acres. At ${int(r.price):,}, it lands "
      f"in the ${int(r.price/r.house_size):,}/sqft range — "
      f"{'a value' if r.price/r.house_size < 250 else 'priced for the market'} "
      f"for the area. Inquire today."
    )

# Estimate price questions, computed from peer-group medians
medians = df.groupby(["state", "bed"])["price"].median().reset_index()

def estimate_q(r):
    return (f"What's a typical asking price for a {int(r.bed)}-bedroom home "
            f"in {r.state}?")

def estimate_a(r):
    m = medians[(medians.state==r.state) & (medians.bed==r.bed)].price.iloc[0]
    return (f"Based on Realtor.com listing data, the typical asking price for a "
            f"{int(r.bed)}-bedroom home in {r.state} is around ${int(m):,}, "
            f"with significant variation by metro and condition.")

OUT = Path("train.jsonl"); records = []
for _, r in df.iterrows():
    # Two examples per row: listing draft + price Q&A
    records.append({"instruction": describe(r), "output": describe_answer(r)})
    records.append({"instruction": estimate_q(r),   "output": estimate_a(r)})

random.shuffle(records)
records = records[:10_000]   # keep training quick

with OUT.open("w") as f:
    for rec in records:
        f.write(json.dumps(rec) + "\n")
print(f"wrote {len(records)} examples → {OUT}")
$ python3 build_dataset.py
wrote 10000 examples → train.jsonl

Look at the first 5 lines with head -5 train.jsonl | jq and confirm they look reasonable.

STEP 3 · ~10 MIN

Set up the LoRA fine-tune with Unsloth

Unsloth is a CUDA-only library that makes LoRA fine-tuning roughly 2× faster and 50% more memory-efficient than vanilla Hugging Face. Apple Silicon cannot run this. Most other GPUs cannot run this. Your DGX can.

~/cuda-week/realestate/train.py
from unsloth import FastLanguageModel
import torch, json
from datasets import Dataset
from trl import SFTTrainer, SFTConfig

MODEL = "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit"
MAX_LEN = 2048

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=MODEL, max_seq_length=MAX_LEN,
    dtype=None, load_in_4bit=True)

# Add LoRA adapters — only ~1% of weights become trainable
model = FastLanguageModel.get_peft_model(
    model, r=16, lora_alpha=32, lora_dropout=0,
    bias="none", use_gradient_checkpointing="unsloth",
    target_modules=["q_proj","k_proj","v_proj","o_proj",
                    "gate_proj","up_proj","down_proj"])

# Format training data as Llama 3.1 chat turns
def to_chat(ex):
    msgs = [{"role":"user", "content": ex["instruction"]},
            {"role":"assistant", "content": ex["output"]}]
    text = tokenizer.apply_chat_template(msgs, tokenize=False)
    return {"text": text}

records = [json.loads(l) for l in open("train.jsonl")]
ds = Dataset.from_list(records).map(to_chat)

trainer = SFTTrainer(
    model=model, tokenizer=tokenizer, train_dataset=ds,
    dataset_text_field="text",
    max_seq_length=MAX_LEN,
    args=SFTConfig(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        warmup_steps=5,
        num_train_epochs=2,
        learning_rate=2e-4,
        bf16=torch.cuda.is_bf16_supported(),
        logging_steps=10,
        output_dir="lora-realestate",
        report_to="none",
    ),
)

trainer.train()
model.save_pretrained("lora-realestate")
tokenizer.save_pretrained("lora-realestate")
print("done")
STEP 4 · ~2 HR

Run the fine-tune (and watch nvtop earn its keep)

$ nvtop             # in another terminal
$ python3 train.py
{'loss': 1.85, 'grad_norm': 0.41, 'learning_rate': 0.00014, 'epoch': 0.16}
{'loss': 1.62, 'grad_norm': 0.34, 'learning_rate': 0.00018, 'epoch': 0.32}
{'loss': 1.48, 'grad_norm': 0.29, 'learning_rate': 0.00019, 'epoch': 0.50}
...
{'loss': 1.04, 'grad_norm': 0.21, 'learning_rate': 0.00006, 'epoch': 1.85}
{'train_runtime': 7124.7, 'train_loss': 1.31}
done

Loss should drop steadily from ~1.8 to ~1.0. You'll see GPU util pinned in the high 90s, ~30 GB of memory used, the chip running 70-75 °C. This is what your $4.7k actually buys you — a 2-hour run that reproducibly produces a custom model.

Take a screenshot of nvtop at the 30-minute mark. The full graph of GPU util holding steady at 99% for two hours is the most photogenic image of CUDA you'll ever capture.

STEP 5 · ~10 MIN

Convert & register with Ollama

The trained LoRA adapter needs to be merged into the base model and then converted to GGUF format so Ollama can serve it.

$ python3 - <<'PY'
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained("lora-realestate", load_in_4bit=True)
model.save_pretrained_gguf("realestate-gguf", tokenizer, quantization_method="q4_k_m")
PY

Now create an Ollama Modelfile pointing at the new weights:

~/cuda-week/realestate/Modelfile
FROM ./realestate-gguf/unsloth.Q4_K_M.gguf

PARAMETER temperature 0.4
PARAMETER top_p 0.9

SYSTEM """You are an expert AI Real Estate Analyst trained on US Realtor.com
listing data. Provide grounded, specific answers about US housing markets,
listings, and pricing. When unsure, say so."""
$ ollama create realestate -f Modelfile
parsing modelfile
creating layer
success
$ ollama list
NAME              SIZE      MODIFIED
realestate:latest 4.9 GB    just now
llama3.1:8b       4.7 GB    a week ago
STEP 6 · ~30 MIN

Demo & honest evaluation

Test the same prompts against base Llama 3.1 8B and your fine-tune. The differences should be visible immediately.

Prompt: "Draft a listing for a 4-bed, 3-bath, 2,800 sqft home in Austin, TX, on 0.25 acres, listed at $725,000."
Base Llama 3.1 8B: A generic, slightly-clumsy paragraph that mentions amenities at random and gets the price-per-sqft math wrong.
Your fine-tune: A tighter listing with the right cadence — header, key specs in correct units, a price-per-sqft observation that matches Austin medians, a closing call to action. Trained on the cadence of 50,000 real listings.
Prompt: "What's a typical price for a 3-bedroom home in Florida?"
Base Llama 3.1 8B: Vague, mentions "varies widely" and gives a number range pulled from training-time-stale memory.
Your fine-tune: States a specific median, ranges by metro (Miami vs. Jacksonville), and acknowledges its data source. Computed on real Realtor.com numbers from your dataset.
Prompt: "I'm looking at homes around $500k in Phoenix. What size house should I expect?"
Base Llama 3.1 8B: Hedges, gives a wide range with no source.
Your fine-tune: Estimates ~1,800-2,400 sqft, references typical price-per-sqft in Phoenix, mentions condition matters. Trained on the underlying distribution.

Run at least 5 of your own prompts. Note where the fine-tune obviously wins, and where it confidently wrong. Confidently wrong is the failure mode you have to learn to spot — the model has memorized cadence without facts. We'll revisit this in Wk 5 when we wire the model to a database.

What you actually accomplished today

Walk through the chain:

The same pipeline scales to any vertical: legal documents, financial filings, medical research, supply-chain data. The data is what you bring; the silicon and the libraries are what NVIDIA brought.

The takeaway

Two years ago this pipeline cost five figures and ran on rented H100s. The combination of (a) better quantization (your day saw FP4), (b) better libraries (Unsloth turned a week-long job into 2 hours), and (c) cheap CUDA hardware (your DGX Spark) made it a Sunday project. That compounding is what makes this curriculum a live wire and not a museum tour.

What's next — Week 2 preview

Now that you've fine-tuned, the natural next move is serve it well. Week 2 covers production inference: vLLM, TensorRT-LLM, continuous batching, throughput optimization. You'll go from "works in Ollama" to "could host this for 100 paying users."

Today's reflection

Look back at what was on your screen seven days ago. Pause. You went from "what is CUDA" to "I fine-tuned a real model on a real public dataset" in a week. That is not a normal week.
Back to Wk 1 home Replay Day 1
← Day 6The Future of CUDA