🛰️ ResearchQwen 2.5-3B-LoRA
Compact, domain-expert Q&A for systems researchers.
Base model: Qwen/Qwen2.5-3B
Tuning recipe: 4-bit QLoRA with bitsandbytes NF4 quantisation
Retriever: FAISS cosine-similarity store for ~33 k document chunks
🚀 Quick inference
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
model_id = "Programmer-RD-AI/ResearchQwen2.5-3B-LoRA"
tok = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype="auto",
load_in_4bit=True, # uses bitsandbytes
)
qa = pipeline("text-generation", model=model, tokenizer=tok)
print(qa("Explain how Chain Replication with Apportioned Queries improves tail-latency."))
llama.cpp / GGUF
wget https://huggingface.co/Programmer-RD-AI/ResearchQwen2.5-3B-LoRA/resolve/main/model_Q4_K_M.gguf
./main -m model_Q4_K_M.gguf -p "Give the core idea of the 3FS log-structured layout in 3 sentences."
📚 Training data
Source | Docs | Words |
---|---|---|
3FS white-paper | 14 | 162 k |
CRAQ spec + benchmarks | 11 | 119 k |
Distributed AI infra notes | 32 | 287 k |
Total | 57 | 568 k |
Synthetic Q&A pairs were generated with an instruction template tuned for factual density; unhelpful pairs were filtered via a weak-to-strong scoring cascade (ROUGE-L > 0.4, BLEU > 0.35) ([GitHub][1]).
🛠️ Fine-tuning details
Setting | Value |
---|---|
GPU | 1× A100 40 GB |
Precision | 4-bit NF4 w/ double-quant (bnb 0.45.4) |
LoRA r/α | 64 / 16 |
LR sched | cosine, 5 % warm-up |
Steps | 1 100 |
Epochs | 3 |
Peak VRAM | 21 GB |
📈 Evaluation
Metric | Base Qwen2.5-3B | This model |
---|---|---|
ROUGE-L | 45.6 | 57.2 |
BLEU-4 | 30.4 | 42.8 |
See
eval/
for scripts and raw scores (ROUGE, BLEU).
🔗 Integration recipe (RAG)
from langchain.vectorstores import FAISS # or llama-index
from langchain.embeddings import HuggingFaceEmbeddings
emb = HuggingFaceEmbeddings(model_name="BAAI/bge-small-en-v1.5")
vs = FAISS.from_texts(texts, emb)
Retriever-generator latency: 330 ms average (GPU), 1.9 s average (CPU, gguf-int4).
💡 Why it should trend
- Fresh domain niche – deep systems-engineering Q&A is underserved on HF.
- Ultra-portable – 4-bit LoRA + GGUF = laptop-friendly.
- Full stack repo – weights, notebook, RAG demo, eval scripts.
- Eye-catching tags –
qwen2
,lora
,rag
,research
map directly to popular HF filters and the trending feed ([Hugging Face][4]). - Clear usage code – copy-run experience = more downloads.
⚠️ Limitations & responsible use
- Trained solely on English; non-English queries degrade sharply.
- Answers may quote or paraphrase the training docs verbatim.
- Not suitable for critical medical / legal advice.
- LoRA adapters are GPL-3.0; commercial use must comply with both GPL-3.0 and the Qwen 2.5 base license.
✍️ Citation
@misc{researchqwen2025,
title = {ResearchQwen 2.5-3B-LoRA: Domain-expert QA for systems research},
author = {Disansa, Ranuga},
year = {2025},
howpublished = {\url{https://huggingface.co/Programmer-RD-AI/ResearchQwen2.5-3B-LoRA}}
}
- Downloads last month
- 0
Hardware compatibility
Log In
to view the estimation
4-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for Programmer-RD-AI/ResearchQwen-2.5-3B-LoRA
Base model
Qwen/Qwen2.5-3B