HF-SmolLM3-3B-Math-Formulas-4bit / README.md

Update README.md

64b157d verified 27 days ago

4.42 kB

	---
	datasets:
	- ddrg/math_formulas
	language:
	- en
	base_model:
	- HuggingFaceTB/SmolLM3-3B
	tags:
	- maths
	- lora
	- peft
	- bitsandbytes
	- small_model
	- 4_bit
	---
	# SmolLM3-3B-Math-Formulas-4bit

	## Model Description

	SmolLM3-3B-Math-Formulas-4bit is a fine-tuned version of [HuggingFaceTB/SmolLM3-3B](https://huggingface.co/HuggingFaceTB/SmolLM3-3B) specialized for mathematical formula understanding and generation. The model has been optimized using 4-bit quantization (NF4) with LoRA adapters for efficient training and inference.

	- Base Model: HuggingFaceTB/SmolLM3-3B
	- Model Type: Causal Language Model
	- Quantization: 4-bit NF4 with double quantization
	- Fine-tuning Method: QLoRA (Quantized Low-Rank Adaptation)
	- Specialization: Mathematical formulas and expressions

	## Training Details

	### Dataset
	- Source: [ddrg/math_formulas](https://huggingface.co/datasets/ddrg/math_formulas)
	- Size: 1,000 samples (randomly selected from 2.89M total)
	- Content: Mathematical formulas, equations, and expressions in LaTeX format

	### Training Configuration
	- Training Loss: 0.589 (final)
	- Epochs: 6
	- Batch Size: 8 (per device)
	- Learning Rate: 2.5e-4 with cosine scheduler
	- Max Sequence Length: 128 tokens
	- Gradient Accumulation: 2 steps
	- Optimizer: AdamW with 0.01 weight decay
	- Precision: FP16
	- LoRA Configuration:
	- r=4, alpha=8
	- Dropout: 0.1
	- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

	### Hardware & Performance
	- Training Time: 265 seconds (4.4 minutes)
	- Training Speed: 5.68 samples/second
	- Total Steps: 96
	- Memory Efficiency: 4-bit quantization for reduced VRAM usage

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	# Load the model and tokenizer
	model_name = "sweatSmile/HF-SmolLM3-3B-Math-Formulas-4bit"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype=torch.float16,
	device_map="auto"
	)

	# Generate mathematical content
	prompt = "Explain this mathematical formula:"
	inputs = tokenizer(prompt, return_tensors="pt")

	with torch.no_grad():
	outputs = model.generate(
	**inputs,
	max_new_tokens=150,
	temperature=0.7,
	do_sample=True,
	pad_token_id=tokenizer.eos_token_id
	)

	response = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print(response)
	```

	## Intended Use Cases

	- Mathematical Education: Explaining mathematical formulas and concepts
	- LaTeX Generation: Creating properly formatted mathematical expressions
	- Formula Analysis: Understanding and breaking down complex mathematical equations
	- Mathematical Problem Solving: Assisting with mathematical computations and derivations

	## Limitations

	- Domain Specific: Optimized primarily for mathematical content
	- Training Data Size: Fine-tuned on only 1,000 samples
	- Quantization Effects: 4-bit quantization may introduce minor precision loss
	- Context Length: Limited to 128 tokens for mathematical expressions
	- Language: Primarily trained on English mathematical notation

	## Performance Metrics

	- Final Training Loss: 0.589
	- Convergence: Achieved in 6 epochs (efficient training)
	- Improvement: 52% loss reduction compared to baseline configuration
	- Efficiency: 51% faster training compared to initial setup

	## Model Architecture

	Based on SmolLM3-3B with the following modifications:
	- 4-bit NF4 quantization for memory efficiency
	- LoRA adapters for parameter-efficient fine-tuning
	- Specialized for mathematical formula understanding

	## Citation

	If you use this model, please cite:

	```bibtex
	@model{smollm3-math-formulas-4bit,
	title={SmolLM3-3B-Math-Formulas-4bit},
	author={sweatSmile},
	year={2025},
	base_model={HuggingFaceTB/SmolLM3-3B},
	dataset={ddrg/math_formulas},
	method={QLoRA fine-tuning with 4-bit quantization}
	}
	```

	## License

	This model inherits the license from the base SmolLM3-3B model. Please refer to the original model's license for usage terms.

	## Acknowledgments

	- Base Model: HuggingFace Team for SmolLM3-3B
	- Dataset: Dresden Database Research Group for the math_formulas dataset
	- Training Framework: Hugging Face Transformers and TRL libraries
	- Quantization: bitsandbytes library for 4-bit optimization