Real Estate Assistant – RAG-Based LLM (Victor Ontiveros)

1. Introduction

The homebuying process is filled with complexity—buyers must consider mortgage eligibility, neighborhood factors, property taxes, and investment potential. However, general-purpose large language models (LLMs) often struggle with specificity, hallucinate responses, and fail to capture local nuances. To address these limitations, I developed a Retrieval-Augmented Generation (RAG)-based real estate assistant using the Mistral-7B model. Instead of traditional fine-tuning, the assistant dynamically retrieves embedded real estate knowledge to generate accurate and grounded responses. The system achieved strong performance across synthetic real estate queries and external evaluation benchmarks including FiQA v2, HotpotQA, and Natural Questions.

2. Training Data

The training data consisted of a structured synthetic real estate dataset designed to simulate realistic user queries. Each example included:

A user query
A real estate context passage
A validated reference response

The dataset was split as follows:

Train: 70%
Validation: 15%
Test: 15%
Random seed: 42

This synthetic data was specifically generated to cover common real estate financial, legal, and neighborhood-related topics.

3. Training Method

Rather than traditional supervised fine-tuning, I implemented a custom Retrieval-Augmented Generation (RAG) pipeline:

Context embeddings were generated using a sentence-transformer model (all-MiniLM-L6-v2).
FAISS indexing enabled efficient retrieval of similar contexts.
The retrieved contexts were fed into the Mistral-7B model for response generation.

Hyperparameters and retrieval settings:

Top-k retrieved contexts: 3
Max new tokens during generation: 300
Temperature: 0.7
Device: CUDA (float16 precision)
Frameworks used: Hugging Face Transformers, Sentence-Transformers, FAISS

This setup allows dynamic, context-grounded generation without traditional parameter updates to the base model.

4. Evaluation

Performance was benchmarked across both synthetic real estate queries and external benchmarks.

Evaluation Results

Dataset	ROUGE-1 F1	ROUGE-L F1
Synthetic (Internal)	~0.10	~0.07
FiQA v2 (Financial QA)	~0.33	~0.30
HotpotQA (Multi-hop QA)	~0.11	~0.10
Natural Questions (Open QA)	~0.02	~0.01

The model achieved strong domain-specific performance on FiQA v2 and synthetic data. It showed moderate generalization on HotpotQA and struggled on open-ended Natural Questions, confirming the importance of domain-specific retrieval for real estate tasks.

5. Usage and Intended Uses

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("your-username/your-real-estate-assistant")
tokenizer = AutoTokenizer.from_pretrained("your-username/your-real-estate-assistant")

query = "I have $90K budget. What are my first-time homebuyer options in Austin, TX?"
inputs = tokenizer(query, return_tensors="pt")
output = model.generate(**inputs, max_new_tokens=300)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Intended Uses:

Homebuyer education and affordability guidance
Mortgage eligibility exploration
Real estate investment decision support
School rating and relocation assistance

6. Prompt Format

Each input consists of a user query combined with retrieved context:

User Query: [user's question]
Context: [retrieved background information]
Answer:

Example:

User Query: What are the average property taxes in Travis County, TX?
Context: The average property tax rate in Travis County, Texas is 2.1% as of 2024, based on Zillow estimates and tax assessor data.
Answer:

7. Expected Output Format

The model returns fluent, grounded natural language responses:

In Travis County, property tax rates typically average 2.1% of assessed home value, although actual rates may vary based on exemptions and city-specific factors.

8. Limitations

The model relies heavily on the quality and relevance of retrieved contexts; weak retrieval can degrade output quality.
Performance on open-domain general queries (as seen with Natural Questions) remains limited.
Since the model is based on synthetic training data, it may not perfectly generalize to all real-world scenarios.
Generated responses do not constitute legal, tax, or financial advice and should be validated by experts.

References

Presented by Victor Ontiveros – UVA SDS Final Project (Spring 2025)