Qwen2-VL: Equation Image β LaTeX with LoRA + Unsloth
Fine-tune Qwen2-VL, a Vision-Language model, to convert equation images into LaTeX code using the Unsloth framework and LoRA adapters.
Project Objective
Train an Equation-to-LaTeX transcriber using a pre-trained multimodal model. The model learns to read rendered math equations and generate corresponding LaTeX.
Dataset
unsloth/LaTeX_OCR
β Image-LaTeX pairs of printed mathematical expressions.- ~68K train / 7K test samples.
- Example:
Tech Stack
Component | Description |
---|---|
Qwen2-VL | Multimodal vision-language model (7B) by Alibaba |
Unsloth | Fast & memory-efficient training |
LoRA (via PEFT) | Parameter-efficient fine-tuning |
4-bit Quantization | Enabled by bitsandbytes |
Datasets, HF Hub | For loading/saving models & datasets |
Setup
pip install unsloth unsloth_zoo peft trl datasets accelerate bitsandbytes xformers==0.0.29.post3 sentencepiece protobuf hf_transfer triton
Training (Jupyter Notebook)
Refer to: Qwen2__VL_image_to_latext.ipynb
Steps:
- Load Qwen2-VL (
load_in_4bit=True
) - Load dataset via
datasets.load_dataset("unsloth/LaTeX_OCR")
- Apply LoRA adapters
- Use
SFTTrainer
from Unsloth to fine-tune - Save adapters or merged model
LoRA rank used: r=16
LoRA alpha: 16
Inference
from PIL import Image
image = Image.open("equation.png")
prompt = "Write the LaTeX representation for this image."
inputs = tokenizer(image, tokenizer.apply_chat_template([("user", prompt)], add_generation_prompt=True), return_tensors="pt").to("cuda")
output = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Evaluation
- Exact Match Accuracy: ~90%+
- Strong generalization to complex equations and symbols
Results
Metric | Value |
---|---|
Exact Match | ~90β92% |
LoRA Params | ~<1% of model |
Training Time | ~20β40 mins on A100 |
Model Size | 7B (4-bit) |
Future Work
- Extend to handwritten formulas (e.g., CROHME dataset)
- Add LaTeX syntax validation or auto-correction
- Build a lightweight Gradio/Streamlit interface for demo
Folder Structure
.
βββ Qwen2__VL_image_to_latext.ipynb # Training Notebook
βββ output/ # Saved fine-tuned model
βββ README.md
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support