ποΈ Murli Assistant - DistilGPT-2 MAXIMUM (Experimental)
β οΈ WARNING: EXPERIMENTAL MODEL - NOT FOR PRODUCTION USE β οΈ
This model represents the absolute maximum possible training for DistilGPT-2 on murli content, but quality remains insufficient for spiritual guidance. Deployed for research, comparison, and educational purposes only.
β οΈ Critical Limitations
Known Quality Issues:
- β Hallucinations persist despite maximum training
- β Social media contamination (Twitter URLs, @mentions in responses)
- β Factual inaccuracies in spiritual concepts
- β Mixed content from base model pre-training
- β Not suitable for spiritual guidance
Why This Model Exists:
- β Research benchmark for small model limitations
- β Comparison baseline vs larger models (Phi-2, Flan-T5)
- β Educational example of training optimization
- β Proof that model size matters for specialized domains
Production Recommendation:
Use Phi-2 (2.7B params) instead - proven quality for murli chatbot.
π― Maximum Training Configuration
This is the BEST DistilGPT-2 can do:
LoRA Configuration (MAXIMUM):
- Rank (r): 32 (8x better than standard r=4)
- Alpha: 64 (8x better than standard alpha=8)
- Target Modules: c_attn, c_proj, c_fc (ALL transformer layers)
- Trainable Parameters: 2.36M (2.80% of model)
- Dropout: 0.05 (reduced for maximum learning)
Training Data (MAXIMUM):
- Murlis Used: 500
- Training Examples: 344
- Context Length: 512 tokens (MAXIMUM)
- Spiritual Concepts: 15 detailed examples with full explanations
Training Configuration (MAXIMUM):
- Epochs: 15 (5x more than standard)
- Effective Batch Size: 16
- Learning Rate: 5e-05 (ultra-careful)
- Warmup Steps: 200 (4x more than standard)
- Scheduler: cosine
- Weight Decay: 0.02 (regularization)
- Training Time: ~2h 50m on CPU
Final Training Loss: 1.609 (66% improvement over standard 4.77)
π Progressive Training Comparison
Version | LoRA Rank | Epochs | Murlis | Loss | Quality |
---|---|---|---|---|---|
Standard | 4 | 3 | 150 | 4.77 | β Poor |
Enhanced | 16 | 10 | 300 | 2.07 | β Poor |
MAXIMUM | 32 | 15 | 500 | 1.61 | β Still Poor |
Key Finding: Loss improvement does NOT guarantee quality improvement for small models in specialized domains.
π¬ What We Learned
Why 82M Parameters Insufficient:
- Base Model Dominance: Pre-trained on internet text (Twitter, social media)
- Fine-tuning Limitations: Only 2.8% of model is trainable with LoRA
- Knowledge Capacity: Cannot store specialized domain knowledge + language ability
- Pattern vs Knowledge: Learns format but not deep spiritual understanding
Improvements in MAXIMUM vs Standard:
β LoRA Rank: 32 (8x from standard, 2x from enhanced) β LoRA Alpha: 64 (8x from standard, 2x from enhanced) β Target Modules: c_attn + c_proj + c_fc (ALL layers) β Epochs: 15 (5x from standard, 1.5x from enhanced) β Murlis: 500 (3.3x from standard, 1.67x from enhanced) β Context: 512 tokens (2x from standard, 1.33x from enhanced) β 15 detailed spiritual concepts with full explanations β 7 different formats per murli for comprehensive learning β Ultra-careful learning rate (5e-5) β Maximum warmup (200 steps) β Larger effective batch (16) β Stronger regularization (0.02 weight decay)
What STILL Doesn't Work:
- Accurate explanations of core BK concepts
- Freedom from social media text patterns
- Consistent factual responses
- Spiritual guidance reliability
π» Usage (For Research/Demo Only)
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
# Load base model
tokenizer = AutoTokenizer.from_pretrained("distilgpt2")
base_model = AutoModelForCausalLM.from_pretrained(
"distilgpt2",
torch_dtype=torch.float16,
device_map="auto"
)
# Load MAXIMUM adapter
model = PeftModel.from_pretrained(
base_model,
"eswarankrishnamurthy/murli-assistant-distilgpt2-maximum"
)
# Chat function
def chat(message):
prompt = f"Question: {message}\nAnswer:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
**inputs,
max_new_tokens=150,
temperature=0.7,
top_p=0.9,
top_k=50,
repetition_penalty=1.2,
no_repeat_ngram_size=3
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
return response.split("Answer:", 1)[1].strip() if "Answer:" in response else response
# Test (expect mixed quality)
print(chat("What is soul consciousness?"))
π Performance Metrics
Inference Speed (CPU):
- Fastest: 1.13s
- Average: 2.69s
- Slowest: 3.55s
Resource Usage:
- RAM: ~1.5-2GB
- Model Size: 3.1 MB (adapter only)
- Base Model: 353 MB (DistilGPT-2)
Compared to Production Models:
- Phi-2 (2.7B): 33x larger, βββββ quality, 5-10s inference
- Flan-T5: 3x larger, ββββ quality, 3-5s inference
- DistilGPT-2 MAX: Smallest, β quality, 1-3s inference
π― Use Cases
β Appropriate Uses:
- Research on model size limitations
- Benchmarking against larger models
- Speed comparisons
- Educational demonstrations
- Training optimization studies
β Inappropriate Uses:
- Spiritual guidance (use Phi-2 instead)
- Production chatbot (unreliable responses)
- Educational content (may teach incorrect concepts)
- Public deployment (without strong disclaimers)
π§ Technical Details
Architecture:
- Base: DistilGPT-2 (82M parameters)
- Fine-tuning: LoRA (Low-Rank Adaptation)
- Modified layers: ALL attention + feed-forward layers
Training Process:
- Connected to MongoDB Atlas (1072 murlis available)
- Selected 500 murlis for training
- Created 344 enhanced training examples
- Trained for 15 epochs with cosine LR schedule
- Achieved lowest possible loss (1.61)
What Went Right:
- Perfect training convergence
- Stable gradients throughout
- Learned BK terminology and format
- Fast inference speed maintained
What Went Wrong:
- Quality didn't match loss improvement
- Social media patterns contaminate responses
- Hallucinations persist despite maximum training
- Cannot reliably explain spiritual concepts
π Research Value
This model proves important insights for AI/ML research:
- Model capacity is non-negotiable for specialized domains
- Loss metrics can be misleading without quality evaluation
- Fine-tuning has fundamental limits based on base model size
- More training β better quality when capacity insufficient
- Pre-training patterns dominate small model behavior
π Educational Message
Before deploying any AI model:
- β Test quality thoroughly, not just training metrics
- β Use appropriate model size for domain complexity
- β Understand fine-tuning limitations
- β Consider base model's pre-training influence
- β Validate against production requirements
π Complete Training History
Completed: 2025-10-03T12:25:52.051354
Loss Progression:
- Epoch 1: 4.68 β 4.48
- Epoch 5: 3.44 (breakthrough)
- Epoch 10: 1.81 (excellent convergence)
- Epoch 15: 1.61 (BEST possible for DistilGPT-2)
Gradient Norms: Stable (0.72 - 1.72)
βοΈ Final Verdict
Technical Success: β
Perfect training, lowest loss achieved
Functional Success: β Quality insufficient for spiritual guidance
Research Value: β
Invaluable insights for model selection
Recommendation:
For production murli chatbot, use Phi-2 fine-tuned on murli data.
This MAXIMUM model demonstrates that small models cannot reliably handle specialized spiritual domains, regardless of training optimization.
π Related Models
- Standard Version: murli-assistant-distilgpt2-lite (LoRA r=4)
- Enhanced Version: To be released (LoRA r=16)
- Recommended Production: Phi-2 based murli assistant (coming soon)
π Citation
@misc{murli-distilgpt2-maximum,
author = {eswarankrishnamurthy},
title = {Murli Assistant - DistilGPT-2 MAXIMUM (Experimental)},
year = {2025},
publisher = {HuggingFace},
note = {Experimental model demonstrating small model limitations},
url = {https://huggingface.co/eswarankrishnamurthy/murli-assistant-distilgpt2-maximum}
}
π§ Contact
For questions about this research or the production Phi-2 model, please open an issue.
β οΈ DISCLAIMER
This model is provided for research and educational purposes only.
- Not suitable for spiritual guidance
- May produce incorrect or misleading information
- Responses should be verified against authentic murli sources
- Use at your own discretion
For reliable murli assistance, consult:
- Official Brahma Kumaris publications
- Experienced BK teachers
- The production Phi-2 based murli assistant (when available)
Om Shanti! π
Maximum training doesn't overcome fundamental capacity limits.
Sometimes you just need a bigger model.
Model Type: Experimental Research Model
Quality Rating: β (Insufficient for production)
Speed Rating: βββββ (Excellent)
Recommended Alternative: Phi-2 (βββββ quality)
- Downloads last month
- 9
Model tree for eswarankrishnamurthy/murli-assistant-distilgpt2-maximum
Base model
distilbert/distilgpt2