FStudent: Distilled Phi-3 Model
FStudent is a knowledge-distilled version of Microsoft's Phi-3-mini-4k-instruct model, trained through a comprehensive distillation pipeline that combines teacher-student learning with self-study mechanisms.
Model Description
FStudent was created using a multi-stage distillation pipeline that transfers knowledge from a larger teacher model (Phi-4) to the smaller Phi-3-mini-4k-instruct model. The model was trained using LoRA adapters, which were then merged with the base model to create this standalone version.
Training Data
The model was trained on a diverse set of data sources:
- PDF Documents: Technical documentation and domain-specific knowledge
- Python Code Dataset: Code examples from the Shuu12121/python-codesearch-dataset-open dataset
- Teacher-Generated Examples: High-quality examples generated by the Phi-4 teacher model
Training Process
The distillation pipeline consisted of six sequential steps:
- Content Extraction & Enrichment: PDF files were processed to extract and enrich text data
- Teacher Pair Generation: Training pairs were generated using the Phi-4 teacher model
- Distillation Training: The student model (Phi-3) was trained using LoRA adapters with the following parameters:
- Learning rate: 1e-4
- Batch size: 4
- Gradient accumulation steps: 8
- Mixed precision training
- 4-bit quantization during training
- Model Merging: The trained LoRA adapters were merged with the base Phi-3 model
- Student Self-Study: The model performed self-directed learning on domain-specific content
- Model Evaluation: The model was evaluated against the teacher model for performance
Model Architecture
- Base Model: microsoft/Phi-3-mini-4k-instruct
- Parameter-Efficient Fine-Tuning: LoRA adapters (merged into this model)
- Context Length: 4K tokens
- Architecture: Transformer-based language model
Intended Uses
This model is designed for:
- General text generation tasks
- Python code understanding and generation
- Technical documentation analysis
- Question answering on domain-specific topics
Performance and Limitations
Strengths
- Faster inference compared to larger models (approximately 2.5x speedup)
- Maintains much of the capability of the teacher model
- Enhanced code understanding due to training on Python code datasets
- Good performance on technical documentation analysis
Limitations
- May not match the full capabilities of larger models on complex reasoning tasks
- Limited context window compared to some larger models
- Performance on specialized domains not covered in training data may be reduced
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load the model and tokenizer
model = AutoModelForCausalLM.from_pretrained("forge1825/FStudent")
tokenizer = AutoTokenizer.from_pretrained("forge1825/FStudent")
# Generate text
input_text = "Write a Python function to calculate the Fibonacci sequence:"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Quantized Usage
For more efficient inference, you can load the model with quantization:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch
# 4-bit quantization configuration
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16
)
# Load the model with quantization
model = AutoModelForCausalLM.from_pretrained(
"forge1825/FStudent",
device_map="auto",
quantization_config=quantization_config
)
tokenizer = AutoTokenizer.from_pretrained("forge1825/FStudent")
Training Details
- Training Framework: Hugging Face Transformers with PEFT
- Optimizer: AdamW
- Learning Rate Schedule: Linear warmup followed by linear decay
- Training Hardware: NVIDIA GPUs
- Distillation Method: Knowledge distillation with teacher-student architecture
- Self-Study Mechanism: Curiosity-driven exploration with hierarchical context
Ethical Considerations
This model inherits the capabilities and limitations of its base model (Phi-3-mini-4k-instruct). While efforts have been made to ensure responsible behavior, the model may still:
- Generate incorrect or misleading information
- Produce biased content reflecting biases in the training data
- Create code that contains bugs or security vulnerabilities
Users should validate and review the model's outputs, especially for sensitive applications.
Citation and Attribution
If you use this model in your research or applications, please cite:
@misc{forge1825_fstudent,
author = {Forge1825},
title = {FStudent: Distilled Phi-3 Model},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/forge1825/FStudent}}
}
Acknowledgements
- Microsoft for the Phi-3-mini-4k-instruct base model
- Hugging Face for the infrastructure and tools
- The creators of the Python code dataset used in training
- Downloads last month
- 1
Evaluation results
- Speedup Factor on Distillation Evaluationself-reported2.5x