FStudent: Distilled Phi-3 Model

FStudent is a knowledge-distilled version of Microsoft's Phi-3-mini-4k-instruct model, trained through a comprehensive distillation pipeline that combines teacher-student learning with self-study mechanisms.

Model Description

FStudent was created using a multi-stage distillation pipeline that transfers knowledge from a larger teacher model (Phi-4) to the smaller Phi-3-mini-4k-instruct model. The model was trained using LoRA adapters, which were then merged with the base model to create this standalone version.

Training Data

The model was trained on a diverse set of data sources:

PDF Documents: Technical documentation and domain-specific knowledge
Python Code Dataset: Code examples from the Shuu12121/python-codesearch-dataset-open dataset
Teacher-Generated Examples: High-quality examples generated by the Phi-4 teacher model

Training Process

The distillation pipeline consisted of six sequential steps:

Content Extraction & Enrichment: PDF files were processed to extract and enrich text data
Teacher Pair Generation: Training pairs were generated using the Phi-4 teacher model
Distillation Training: The student model (Phi-3) was trained using LoRA adapters with the following parameters:
- Learning rate: 1e-4
- Batch size: 4
- Gradient accumulation steps: 8
- Mixed precision training
- 4-bit quantization during training
Model Merging: The trained LoRA adapters were merged with the base Phi-3 model
Student Self-Study: The model performed self-directed learning on domain-specific content
Model Evaluation: The model was evaluated against the teacher model for performance

Model Architecture

Base Model: microsoft/Phi-3-mini-4k-instruct
Parameter-Efficient Fine-Tuning: LoRA adapters (merged into this model)
Context Length: 4K tokens
Architecture: Transformer-based language model

Intended Uses

This model is designed for:

General text generation tasks
Python code understanding and generation
Technical documentation analysis
Question answering on domain-specific topics

Performance and Limitations

Strengths

Faster inference compared to larger models (approximately 2.5x speedup)
Maintains much of the capability of the teacher model
Enhanced code understanding due to training on Python code datasets
Good performance on technical documentation analysis

Limitations

May not match the full capabilities of larger models on complex reasoning tasks
Limited context window compared to some larger models
Performance on specialized domains not covered in training data may be reduced

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the model and tokenizer
model = AutoModelForCausalLM.from_pretrained("forge1825/FStudent")
tokenizer = AutoTokenizer.from_pretrained("forge1825/FStudent")

# Generate text
input_text = "Write a Python function to calculate the Fibonacci sequence:"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Quantized Usage

For more efficient inference, you can load the model with quantization:

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

# 4-bit quantization configuration
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16
)

# Load the model with quantization
model = AutoModelForCausalLM.from_pretrained(
    "forge1825/FStudent",
    device_map="auto",
    quantization_config=quantization_config
)
tokenizer = AutoTokenizer.from_pretrained("forge1825/FStudent")

Training Details

Training Framework: Hugging Face Transformers with PEFT
Optimizer: AdamW
Learning Rate Schedule: Linear warmup followed by linear decay
Training Hardware: NVIDIA GPUs
Distillation Method: Knowledge distillation with teacher-student architecture
Self-Study Mechanism: Curiosity-driven exploration with hierarchical context

Ethical Considerations

This model inherits the capabilities and limitations of its base model (Phi-3-mini-4k-instruct). While efforts have been made to ensure responsible behavior, the model may still:

Generate incorrect or misleading information
Produce biased content reflecting biases in the training data
Create code that contains bugs or security vulnerabilities

Users should validate and review the model's outputs, especially for sensitive applications.

Citation and Attribution

If you use this model in your research or applications, please cite:

@misc{forge1825_fstudent,
  author = {Forge1825},
  title = {FStudent: Distilled Phi-3 Model},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/forge1825/FStudent}}
}

Acknowledgements

Microsoft for the Phi-3-mini-4k-instruct base model
Hugging Face for the infrastructure and tools
The creators of the Python code dataset used in training

forge1825
/

FStudent