Build an AI Real Estate Analyst on your desk.
Today is the project. We'll grab a public dataset of ~2 million US real estate listings, fine-tune Llama 3.1 8B on it using LoRA, deploy the result with Ollama, and end the day with a private AI that drafts listings, prices homes, and reasons about US housing markets — running entirely on your DGX Spark. Plan for ~3-4 hours, most of which is the GPU doing its thing while you read.
What we're building, exactly
A LoRA fine-tune of Llama 3.1 8B Instruct, taught on synthesized Q&A pairs derived from the public USA Real Estate dataset on Kaggle (≈2.2M Realtor.com listings). The result will:
- Draft realistic listings from a few features ("3BR / 2BA / 1800 sqft / Phoenix, AZ / $450k").
- Reason about US market patterns (price-per-sqft by metro, typical lot sizes by region).
- Answer Q&A about the data the way a junior real-estate analyst would.
It will not be a perfect Zestimate replacement. The point is to feel the full loop — public data → curated training set → real fine-tune → deployed model — on your hardware. By the end, you'll know how every "AI specialist for X" startup is actually built.
Every "AI for [vertical]" company you'll see in the next two years is doing roughly what we're doing today, with more polish: take a public or proprietary dataset, fine-tune an open model on it, ship it as the "expert AI." The moat for most vertical-AI products is the data + the fine-tune, not novel model research. CUDA is what makes this feasible on a budget.
Architecture, in one diagram
Lab
Get the dataset from Kaggle
Sign in at kaggle.com (free), open your account → Settings → API → Create New Token. Save the resulting kaggle.json to your DGX as below.
$ ssh you@dgx-spark.local $ mkdir -p ~/.kaggle && mv kaggle.json ~/.kaggle/ && chmod 600 ~/.kaggle/kaggle.json $ pip install kaggle $ cd ~/cuda-week && mkdir realestate && cd realestate $ kaggle datasets download ahmedshahriarsakib/usa-real-estate-dataset $ unzip usa-real-estate-dataset.zip
You'll have a CSV named realtor-data.zip.csv with columns like price, bed, bath, acre_lot, city, state, zip_code, house_size, status. About 2.2M rows.
Build a Q&A training set
The model needs instruction-style data: a question/instruction and a desired answer. We'll synthesize ~10k Q&A pairs from the listings — a mix of "describe this property," "estimate price for X," and "what's the median Y in city Z."
~/cuda-week/realestate/build_dataset.pyimport pandas as pd, json, random from pathlib import Path df = pd.read_csv("realtor-data.zip.csv") df = df.dropna(subset=["price","bed","bath","city","state","house_size"]) df = df[(df.price.between(50_000, 5_000_000)) & (df.house_size.between(300, 10_000))] df = df.sample(n=50_000, random_state=42).reset_index(drop=True) def describe(r): return (f"Draft a real estate listing for a {int(r.bed)}-bed, {int(r.bath)}-bath, " f"{int(r.house_size)} sqft home in {r.city}, {r.state}, " f"on {r.acre_lot} acres, listed at ${int(r.price):,}.") def describe_answer(r): # Templated; the model will learn to generalize the cadence. return ( f"Welcome home. This {int(r.bed)}BR/{int(r.bath)}BA residence in " f"{r.city}, {r.state} offers {int(r.house_size):,} square feet of " f"living space on {r.acre_lot} acres. At ${int(r.price):,}, it lands " f"in the ${int(r.price/r.house_size):,}/sqft range — " f"{'a value' if r.price/r.house_size < 250 else 'priced for the market'} " f"for the area. Inquire today." ) # Estimate price questions, computed from peer-group medians medians = df.groupby(["state", "bed"])["price"].median().reset_index() def estimate_q(r): return (f"What's a typical asking price for a {int(r.bed)}-bedroom home " f"in {r.state}?") def estimate_a(r): m = medians[(medians.state==r.state) & (medians.bed==r.bed)].price.iloc[0] return (f"Based on Realtor.com listing data, the typical asking price for a " f"{int(r.bed)}-bedroom home in {r.state} is around ${int(m):,}, " f"with significant variation by metro and condition.") OUT = Path("train.jsonl"); records = [] for _, r in df.iterrows(): # Two examples per row: listing draft + price Q&A records.append({"instruction": describe(r), "output": describe_answer(r)}) records.append({"instruction": estimate_q(r), "output": estimate_a(r)}) random.shuffle(records) records = records[:10_000] # keep training quick with OUT.open("w") as f: for rec in records: f.write(json.dumps(rec) + "\n") print(f"wrote {len(records)} examples → {OUT}")
$ python3 build_dataset.py wrote 10000 examples → train.jsonl
Look at the first 5 lines with head -5 train.jsonl | jq and confirm they look reasonable.
Set up the LoRA fine-tune with Unsloth
Unsloth is a CUDA-only library that makes LoRA fine-tuning roughly 2× faster and 50% more memory-efficient than vanilla Hugging Face. Apple Silicon cannot run this. Most other GPUs cannot run this. Your DGX can.
~/cuda-week/realestate/train.pyfrom unsloth import FastLanguageModel import torch, json from datasets import Dataset from trl import SFTTrainer, SFTConfig MODEL = "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit" MAX_LEN = 2048 model, tokenizer = FastLanguageModel.from_pretrained( model_name=MODEL, max_seq_length=MAX_LEN, dtype=None, load_in_4bit=True) # Add LoRA adapters — only ~1% of weights become trainable model = FastLanguageModel.get_peft_model( model, r=16, lora_alpha=32, lora_dropout=0, bias="none", use_gradient_checkpointing="unsloth", target_modules=["q_proj","k_proj","v_proj","o_proj", "gate_proj","up_proj","down_proj"]) # Format training data as Llama 3.1 chat turns def to_chat(ex): msgs = [{"role":"user", "content": ex["instruction"]}, {"role":"assistant", "content": ex["output"]}] text = tokenizer.apply_chat_template(msgs, tokenize=False) return {"text": text} records = [json.loads(l) for l in open("train.jsonl")] ds = Dataset.from_list(records).map(to_chat) trainer = SFTTrainer( model=model, tokenizer=tokenizer, train_dataset=ds, dataset_text_field="text", max_seq_length=MAX_LEN, args=SFTConfig( per_device_train_batch_size=2, gradient_accumulation_steps=4, warmup_steps=5, num_train_epochs=2, learning_rate=2e-4, bf16=torch.cuda.is_bf16_supported(), logging_steps=10, output_dir="lora-realestate", report_to="none", ), ) trainer.train() model.save_pretrained("lora-realestate") tokenizer.save_pretrained("lora-realestate") print("done")
Run the fine-tune (and watch nvtop earn its keep)
$ nvtop # in another terminal $ python3 train.py {'loss': 1.85, 'grad_norm': 0.41, 'learning_rate': 0.00014, 'epoch': 0.16} {'loss': 1.62, 'grad_norm': 0.34, 'learning_rate': 0.00018, 'epoch': 0.32} {'loss': 1.48, 'grad_norm': 0.29, 'learning_rate': 0.00019, 'epoch': 0.50} ... {'loss': 1.04, 'grad_norm': 0.21, 'learning_rate': 0.00006, 'epoch': 1.85} {'train_runtime': 7124.7, 'train_loss': 1.31} done
Loss should drop steadily from ~1.8 to ~1.0. You'll see GPU util pinned in the high 90s, ~30 GB of memory used, the chip running 70-75 °C. This is what your $4.7k actually buys you — a 2-hour run that reproducibly produces a custom model.
Take a screenshot of nvtop at the 30-minute mark. The full graph of GPU util holding steady at 99% for two hours is the most photogenic image of CUDA you'll ever capture.
Convert & register with Ollama
The trained LoRA adapter needs to be merged into the base model and then converted to GGUF format so Ollama can serve it.
$ python3 - <<'PY' from unsloth import FastLanguageModel model, tokenizer = FastLanguageModel.from_pretrained("lora-realestate", load_in_4bit=True) model.save_pretrained_gguf("realestate-gguf", tokenizer, quantization_method="q4_k_m") PY
Now create an Ollama Modelfile pointing at the new weights:
~/cuda-week/realestate/ModelfileFROM ./realestate-gguf/unsloth.Q4_K_M.gguf PARAMETER temperature 0.4 PARAMETER top_p 0.9 SYSTEM """You are an expert AI Real Estate Analyst trained on US Realtor.com listing data. Provide grounded, specific answers about US housing markets, listings, and pricing. When unsure, say so."""
$ ollama create realestate -f Modelfile parsing modelfile creating layer success $ ollama list NAME SIZE MODIFIED realestate:latest 4.9 GB just now llama3.1:8b 4.7 GB a week ago
Demo & honest evaluation
Test the same prompts against base Llama 3.1 8B and your fine-tune. The differences should be visible immediately.
Run at least 5 of your own prompts. Note where the fine-tune obviously wins, and where it confidently wrong. Confidently wrong is the failure mode you have to learn to spot — the model has memorized cadence without facts. We'll revisit this in Wk 5 when we wire the model to a database.
What you actually accomplished today
Walk through the chain:
- Pulled 2.2M real-world records from a public source.
- Curated 10,000 instruction-tuned examples in 30 lines of Python.
- Used CUDA-only tools (Unsloth, BitsandBytes, FlashAttention) to fine-tune a real 8B-parameter model in 2 hours on a desk computer.
- Converted the result into Ollama format and now serve it as
realestatealongside any other model. - Did the entire thing for ~$0.40 of electricity.
The same pipeline scales to any vertical: legal documents, financial filings, medical research, supply-chain data. The data is what you bring; the silicon and the libraries are what NVIDIA brought.
Two years ago this pipeline cost five figures and ran on rented H100s. The combination of (a) better quantization (your day saw FP4), (b) better libraries (Unsloth turned a week-long job into 2 hours), and (c) cheap CUDA hardware (your DGX Spark) made it a Sunday project. That compounding is what makes this curriculum a live wire and not a museum tour.
What's next — Week 2 preview
Now that you've fine-tuned, the natural next move is serve it well. Week 2 covers production inference: vLLM, TensorRT-LLM, continuous batching, throughput optimization. You'll go from "works in Ollama" to "could host this for 100 paying users."
Today's reflection
Look back at what was on your screen seven days ago. Pause. You went from "what is CUDA" to "I fine-tuned a real model on a real public dataset" in a week. That is not a normal week.