FStudent: Distilled Phi-3 Model

FStudent is a knowledge-distilled version of Microsoft's Phi-3-mini-4k-instruct model, trained through a comprehensive distillation pipeline that combines teacher-student learning with self-study mechanisms.

Model Description

FStudent was created using a multi-stage distillation pipeline that transfers knowledge from a larger teacher model (Phi-4) to the smaller Phi-3-mini-4k-instruct model. The model was trained using LoRA adapters, which were then merged with the base model to create this standalone version.

Training Data

The model was trained on a diverse set of data sources:

  1. PDF Documents: Technical documentation and domain-specific knowledge
  2. Python Code Dataset: Code examples from the Shuu12121/python-codesearch-dataset-open dataset
  3. Teacher-Generated Examples: High-quality examples generated by the Phi-4 teacher model

Training Process

The distillation pipeline consisted of six sequential steps:

  1. Content Extraction & Enrichment: PDF files were processed to extract and enrich text data
  2. Teacher Pair Generation: Training pairs were generated using the Phi-4 teacher model
  3. Distillation Training: The student model (Phi-3) was trained using LoRA adapters with the following parameters:
    • Learning rate: 1e-4
    • Batch size: 4
    • Gradient accumulation steps: 8
    • Mixed precision training
    • 4-bit quantization during training
  4. Model Merging: The trained LoRA adapters were merged with the base Phi-3 model
  5. Student Self-Study: The model performed self-directed learning on domain-specific content
  6. Model Evaluation: The model was evaluated against the teacher model for performance

Model Architecture

  • Base Model: microsoft/Phi-3-mini-4k-instruct
  • Parameter-Efficient Fine-Tuning: LoRA adapters (merged into this model)
  • Context Length: 4K tokens
  • Architecture: Transformer-based language model

Intended Uses

This model is designed for:

  • General text generation tasks
  • Python code understanding and generation
  • Technical documentation analysis
  • Question answering on domain-specific topics

Performance and Limitations

Strengths

  • Faster inference compared to larger models (approximately 2.5x speedup)
  • Maintains much of the capability of the teacher model
  • Enhanced code understanding due to training on Python code datasets
  • Good performance on technical documentation analysis

Limitations

  • May not match the full capabilities of larger models on complex reasoning tasks
  • Limited context window compared to some larger models
  • Performance on specialized domains not covered in training data may be reduced

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the model and tokenizer
model = AutoModelForCausalLM.from_pretrained("forge1825/FStudent")
tokenizer = AutoTokenizer.from_pretrained("forge1825/FStudent")

# Generate text
input_text = "Write a Python function to calculate the Fibonacci sequence:"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Quantized Usage

For more efficient inference, you can load the model with quantization:

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

# 4-bit quantization configuration
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16
)

# Load the model with quantization
model = AutoModelForCausalLM.from_pretrained(
    "forge1825/FStudent",
    device_map="auto",
    quantization_config=quantization_config
)
tokenizer = AutoTokenizer.from_pretrained("forge1825/FStudent")

Training Details

  • Training Framework: Hugging Face Transformers with PEFT
  • Optimizer: AdamW
  • Learning Rate Schedule: Linear warmup followed by linear decay
  • Training Hardware: NVIDIA GPUs
  • Distillation Method: Knowledge distillation with teacher-student architecture
  • Self-Study Mechanism: Curiosity-driven exploration with hierarchical context

Ethical Considerations

This model inherits the capabilities and limitations of its base model (Phi-3-mini-4k-instruct). While efforts have been made to ensure responsible behavior, the model may still:

  • Generate incorrect or misleading information
  • Produce biased content reflecting biases in the training data
  • Create code that contains bugs or security vulnerabilities

Users should validate and review the model's outputs, especially for sensitive applications.

Citation and Attribution

If you use this model in your research or applications, please cite:

@misc{forge1825_fstudent,
  author = {Forge1825},
  title = {FStudent: Distilled Phi-3 Model},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/forge1825/FStudent}}
}

Acknowledgements

  • Microsoft for the Phi-3-mini-4k-instruct base model
  • Hugging Face for the infrastructure and tools
  • The creators of the Python code dataset used in training
Downloads last month
1
Safetensors
Model size
3.82B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Evaluation results