Real Estate Assistant – RAG-Based LLM (Victor Ontiveros)
1. Introduction
The homebuying process is filled with complexity—buyers must consider mortgage eligibility, neighborhood factors, property taxes, and investment potential. However, general-purpose large language models (LLMs) often struggle with specificity, hallucinate responses, and fail to capture local nuances. To address these limitations, I developed a Retrieval-Augmented Generation (RAG)-based real estate assistant using the Mistral-7B model. Instead of traditional fine-tuning, the assistant dynamically retrieves embedded real estate knowledge to generate accurate and grounded responses. The system achieved strong performance across synthetic real estate queries and external evaluation benchmarks including FiQA v2, HotpotQA, and Natural Questions.
2. Training Data
The training data consisted of a structured synthetic real estate dataset designed to simulate realistic user queries. Each example included:
- A user query
- A real estate context passage
- A validated reference response
The dataset was split as follows:
- Train: 70%
- Validation: 15%
- Test: 15%
- Random seed: 42
This synthetic data was specifically generated to cover common real estate financial, legal, and neighborhood-related topics.
3. Training Method
Rather than traditional supervised fine-tuning, I implemented a custom Retrieval-Augmented Generation (RAG) pipeline:
- Context embeddings were generated using a sentence-transformer model (
all-MiniLM-L6-v2
). - FAISS indexing enabled efficient retrieval of similar contexts.
- The retrieved contexts were fed into the Mistral-7B model for response generation.
Hyperparameters and retrieval settings:
- Top-k retrieved contexts: 3
- Max new tokens during generation: 300
- Temperature: 0.7
- Device: CUDA (float16 precision)
- Frameworks used: Hugging Face Transformers, Sentence-Transformers, FAISS
This setup allows dynamic, context-grounded generation without traditional parameter updates to the base model.
4. Evaluation
Performance was benchmarked across both synthetic real estate queries and external benchmarks.
Evaluation Results
Dataset | ROUGE-1 F1 | ROUGE-L F1 |
---|---|---|
Synthetic (Internal) | ~0.10 | ~0.07 |
FiQA v2 (Financial QA) | ~0.33 | ~0.30 |
HotpotQA (Multi-hop QA) | ~0.11 | ~0.10 |
Natural Questions (Open QA) | ~0.02 | ~0.01 |
The model achieved strong domain-specific performance on FiQA v2 and synthetic data. It showed moderate generalization on HotpotQA and struggled on open-ended Natural Questions, confirming the importance of domain-specific retrieval for real estate tasks.
5. Usage and Intended Uses
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("your-username/your-real-estate-assistant")
tokenizer = AutoTokenizer.from_pretrained("your-username/your-real-estate-assistant")
query = "I have $90K budget. What are my first-time homebuyer options in Austin, TX?"
inputs = tokenizer(query, return_tensors="pt")
output = model.generate(**inputs, max_new_tokens=300)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Intended Uses:
- Homebuyer education and affordability guidance
- Mortgage eligibility exploration
- Real estate investment decision support
- School rating and relocation assistance
6. Prompt Format
Each input consists of a user query combined with retrieved context:
User Query: [user's question]
Context: [retrieved background information]
Answer:
Example:
User Query: What are the average property taxes in Travis County, TX?
Context: The average property tax rate in Travis County, Texas is 2.1% as of 2024, based on Zillow estimates and tax assessor data.
Answer:
7. Expected Output Format
The model returns fluent, grounded natural language responses:
In Travis County, property tax rates typically average 2.1% of assessed home value, although actual rates may vary based on exemptions and city-specific factors.
8. Limitations
- The model relies heavily on the quality and relevance of retrieved contexts; weak retrieval can degrade output quality.
- Performance on open-domain general queries (as seen with Natural Questions) remains limited.
- Since the model is based on synthetic training data, it may not perfectly generalize to all real-world scenarios.
- Generated responses do not constitute legal, tax, or financial advice and should be validated by experts.
References
- FAISS - Facebook AI Similarity Search
- Hugging Face Transformers
- Sentence Transformers
- Mistral-7B Model Card
Presented by Victor Ontiveros – UVA SDS Final Project (Spring 2025)