Qwen2-VL: Equation Image → LaTeX with LoRA + Unsloth

Fine-tune Qwen2-VL, a Vision-Language model, to convert equation images into LaTeX code using the Unsloth framework and LoRA adapters.

Project Objective

Train an Equation-to-LaTeX transcriber using a pre-trained multimodal model. The model learns to read rendered math equations and generate corresponding LaTeX.

Source code on github

Dataset

unsloth/LaTeX_OCR – Image-LaTeX pairs of printed mathematical expressions.
~68K train / 7K test samples.
Example:
- Image:
- Target: R - { \frac { 1 } { 2 } } ( \nabla \Phi ) ^ { 2 } - { \frac { 1 } { 2 } } \nabla ^ { 2 } \Phi = 0 .

Tech Stack

Component	Description
Qwen2-VL	Multimodal vision-language model (7B) by Alibaba
Unsloth	Fast & memory-efficient training
LoRA (via PEFT)	Parameter-efficient fine-tuning
4-bit Quantization	Enabled by `bitsandbytes`
Datasets, HF Hub	For loading/saving models & datasets

Setup

pip install unsloth unsloth_zoo peft trl datasets accelerate bitsandbytes xformers==0.0.29.post3 sentencepiece protobuf hf_transfer triton

Training (Jupyter Notebook)

Refer to: Qwen2__VL_image_to_latext.ipynb

Steps:

Load Qwen2-VL (load_in_4bit=True)
Load dataset via datasets.load_dataset("unsloth/LaTeX_OCR")
Apply LoRA adapters
Use SFTTrainer from Unsloth to fine-tune
Save adapters or merged model

LoRA rank used: r=16
LoRA alpha: 16

Inference

from PIL import Image
image = Image.open("equation.png")
prompt = "Write the LaTeX representation for this image."
inputs = tokenizer(image, tokenizer.apply_chat_template([("user", prompt)], add_generation_prompt=True), return_tensors="pt").to("cuda")
output = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Evaluation

Exact Match Accuracy: ~90%+
Strong generalization to complex equations and symbols

Results

Metric	Value
Exact Match	~90–92%
LoRA Params	~<1% of model
Training Time	~20–40 mins on A100
Model Size	7B (4-bit)

Future Work

Extend to handwritten formulas (e.g., CROHME dataset)
Add LaTeX syntax validation or auto-correction
Build a lightweight Gradio/Streamlit interface for demo

Folder Structure

.
├── Qwen2__VL_image_to_latext.ipynb   # Training Notebook
├── output/                           # Saved fine-tuned model
└── README.md

Mayank022
/

qwen2-vl-finetuned-Image-to-LaTeX