|
--- |
|
datasets: |
|
- ddrg/math_formulas |
|
language: |
|
- en |
|
base_model: |
|
- HuggingFaceTB/SmolLM3-3B |
|
tags: |
|
- maths |
|
- lora |
|
- peft |
|
- bitsandbytes |
|
- small_model |
|
- 4_bit |
|
--- |
|
# SmolLM3-3B-Math-Formulas-4bit |
|
|
|
## Model Description |
|
|
|
**SmolLM3-3B-Math-Formulas-4bit** is a fine-tuned version of [HuggingFaceTB/SmolLM3-3B](https://huggingface.co/HuggingFaceTB/SmolLM3-3B) specialized for mathematical formula understanding and generation. The model has been optimized using 4-bit quantization (NF4) with LoRA adapters for efficient training and inference. |
|
|
|
- **Base Model**: HuggingFaceTB/SmolLM3-3B |
|
- **Model Type**: Causal Language Model |
|
- **Quantization**: 4-bit NF4 with double quantization |
|
- **Fine-tuning Method**: QLoRA (Quantized Low-Rank Adaptation) |
|
- **Specialization**: Mathematical formulas and expressions |
|
|
|
## Training Details |
|
|
|
### Dataset |
|
- **Source**: [ddrg/math_formulas](https://huggingface.co/datasets/ddrg/math_formulas) |
|
- **Size**: 1,000 samples (randomly selected from 2.89M total) |
|
- **Content**: Mathematical formulas, equations, and expressions in LaTeX format |
|
|
|
### Training Configuration |
|
- **Training Loss**: 0.589 (final) |
|
- **Epochs**: 6 |
|
- **Batch Size**: 8 (per device) |
|
- **Learning Rate**: 2.5e-4 with cosine scheduler |
|
- **Max Sequence Length**: 128 tokens |
|
- **Gradient Accumulation**: 2 steps |
|
- **Optimizer**: AdamW with 0.01 weight decay |
|
- **Precision**: FP16 |
|
- **LoRA Configuration**: |
|
- r=4, alpha=8 |
|
- Dropout: 0.1 |
|
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
|
|
|
### Hardware & Performance |
|
- **Training Time**: 265 seconds (4.4 minutes) |
|
- **Training Speed**: 5.68 samples/second |
|
- **Total Steps**: 96 |
|
- **Memory Efficiency**: 4-bit quantization for reduced VRAM usage |
|
|
|
## Usage |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
import torch |
|
|
|
# Load the model and tokenizer |
|
model_name = "sweatSmile/HF-SmolLM3-3B-Math-Formulas-4bit" |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_name, |
|
torch_dtype=torch.float16, |
|
device_map="auto" |
|
) |
|
|
|
# Generate mathematical content |
|
prompt = "Explain this mathematical formula:" |
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
|
|
with torch.no_grad(): |
|
outputs = model.generate( |
|
**inputs, |
|
max_new_tokens=150, |
|
temperature=0.7, |
|
do_sample=True, |
|
pad_token_id=tokenizer.eos_token_id |
|
) |
|
|
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
print(response) |
|
``` |
|
|
|
## Intended Use Cases |
|
|
|
- **Mathematical Education**: Explaining mathematical formulas and concepts |
|
- **LaTeX Generation**: Creating properly formatted mathematical expressions |
|
- **Formula Analysis**: Understanding and breaking down complex mathematical equations |
|
- **Mathematical Problem Solving**: Assisting with mathematical computations and derivations |
|
|
|
## Limitations |
|
|
|
- **Domain Specific**: Optimized primarily for mathematical content |
|
- **Training Data Size**: Fine-tuned on only 1,000 samples |
|
- **Quantization Effects**: 4-bit quantization may introduce minor precision loss |
|
- **Context Length**: Limited to 128 tokens for mathematical expressions |
|
- **Language**: Primarily trained on English mathematical notation |
|
|
|
## Performance Metrics |
|
|
|
- **Final Training Loss**: 0.589 |
|
- **Convergence**: Achieved in 6 epochs (efficient training) |
|
- **Improvement**: 52% loss reduction compared to baseline configuration |
|
- **Efficiency**: 51% faster training compared to initial setup |
|
|
|
## Model Architecture |
|
|
|
Based on SmolLM3-3B with the following modifications: |
|
- 4-bit NF4 quantization for memory efficiency |
|
- LoRA adapters for parameter-efficient fine-tuning |
|
- Specialized for mathematical formula understanding |
|
|
|
## Citation |
|
|
|
If you use this model, please cite: |
|
|
|
```bibtex |
|
@model{smollm3-math-formulas-4bit, |
|
title={SmolLM3-3B-Math-Formulas-4bit}, |
|
author={sweatSmile}, |
|
year={2025}, |
|
base_model={HuggingFaceTB/SmolLM3-3B}, |
|
dataset={ddrg/math_formulas}, |
|
method={QLoRA fine-tuning with 4-bit quantization} |
|
} |
|
``` |
|
|
|
## License |
|
|
|
This model inherits the license from the base SmolLM3-3B model. Please refer to the original model's license for usage terms. |
|
|
|
## Acknowledgments |
|
|
|
- **Base Model**: HuggingFace Team for SmolLM3-3B |
|
- **Dataset**: Dresden Database Research Group for the math_formulas dataset |
|
- **Training Framework**: Hugging Face Transformers and TRL libraries |
|
- **Quantization**: bitsandbytes library for 4-bit optimization |