--- base_model: unsloth/qwen2-vl-7b-instruct-unsloth-bnb-4bit tags: - text-generation-inference - transformers - unsloth - qwen2_vl - trl license: apache-2.0 language: - en datasets: - unsloth/LaTeX_OCR library_name: unsloth model_name: Qwen2-VL-7B-Instruct with LoRA (Equation-to-LaTeX) --- # Qwen2-VL: Equation Image → LaTeX with LoRA + Unsloth Fine-tune [Qwen2-VL](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct), a Vision-Language model, to convert equation images into LaTeX code using the [Unsloth](https://github.com/unslothai/unsloth) framework and LoRA adapters. ## Project Objective Train an Equation-to-LaTeX transcriber using a pre-trained multimodal model. The model learns to read rendered math equations and generate corresponding LaTeX. ![image/gif](https://cdn-uploads.huggingface.co/production/uploads/666c3d6489e21df7d4a02805/zVB5_lPq5v8EeHRbpSLtE.gif) --- [Source code on github ](https://github.com/Mayankpratapsingh022/Finetuning-LLMs/tree/main/Qwen_2_VL_Multimodel_LLM_Finetuning) ## Dataset - [`unsloth/LaTeX_OCR`](https://huggingface.co/datasets/unsloth/LaTeX_OCR) – Image-LaTeX pairs of printed mathematical expressions. - ~68K train / 7K test samples. - Example: - Image: ![image](https://github.com/user-attachments/assets/e0d87582-7ba4-4e59-8f00-fd8f6c0f862d) - Target: `R - { \frac { 1 } { 2 } } ( \nabla \Phi ) ^ { 2 } - { \frac { 1 } { 2 } } \nabla ^ { 2 } \Phi = 0 .` --- ## Tech Stack | Component | Description | |----------|-------------| | Qwen2-VL | Multimodal vision-language model (7B) by Alibaba | | Unsloth | Fast & memory-efficient training | | LoRA (via PEFT) | Parameter-efficient fine-tuning | | 4-bit Quantization | Enabled by `bitsandbytes` | | Datasets, HF Hub | For loading/saving models & datasets | --- ## Setup ```bash pip install unsloth unsloth_zoo peft trl datasets accelerate bitsandbytes xformers==0.0.29.post3 sentencepiece protobuf hf_transfer triton ``` --- ## Training (Jupyter Notebook) Refer to: `Qwen2__VL_image_to_latext.ipynb` Steps: 1. Load Qwen2-VL (`load_in_4bit=True`) 2. Load dataset via `datasets.load_dataset("unsloth/LaTeX_OCR")` 3. Apply LoRA adapters 4. Use `SFTTrainer` from Unsloth to fine-tune 5. Save adapters or merged model LoRA rank used: `r=16` LoRA alpha: `16` --- ## Inference ```python from PIL import Image image = Image.open("equation.png") prompt = "Write the LaTeX representation for this image." inputs = tokenizer(image, tokenizer.apply_chat_template([("user", prompt)], add_generation_prompt=True), return_tensors="pt").to("cuda") output = model.generate(**inputs, max_new_tokens=128) print(tokenizer.decode(output[0], skip_special_tokens=True)) ``` --- ## Evaluation - Exact Match Accuracy: ~90%+ - Strong generalization to complex equations and symbols --- ## Results | Metric | Value | |------------------|---------------| | Exact Match | ~90–92% | | LoRA Params | ~<1% of model | | Training Time | ~20–40 mins on A100 | | Model Size | 7B (4-bit) | --- ## Future Work - Extend to handwritten formulas (e.g., CROHME dataset) - Add LaTeX syntax validation or auto-correction - Build a lightweight Gradio/Streamlit interface for demo --- ## Folder Structure ``` . ├── Qwen2__VL_image_to_latext.ipynb # Training Notebook ├── output/ # Saved fine-tuned model └── README.md ``` ---