Mayank022 commited on
Commit
e9fb5a1
Β·
verified Β·
1 Parent(s): fd4fb62

updated the model card

Browse files
Files changed (1) hide show
  1. README.md +115 -6
README.md CHANGED
@@ -9,14 +9,123 @@ tags:
9
  license: apache-2.0
10
  language:
11
  - en
 
 
 
 
12
  ---
13
 
14
- # Uploaded model
15
 
16
- - **Developed by:** Mayank022
17
- - **License:** apache-2.0
18
- - **Finetuned from model :** unsloth/qwen2-vl-7b-instruct-unsloth-bnb-4bit
19
 
20
- This qwen2_vl model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
21
 
22
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  license: apache-2.0
10
  language:
11
  - en
12
+ datasets:
13
+ - unsloth/LaTeX_OCR
14
+ library_name: unsloth
15
+ model_name: Qwen2-VL-7B-Instruct with LoRA (Equation-to-LaTeX)
16
  ---
17
 
18
+ # Qwen2-VL: Equation Image β†’ LaTeX with LoRA + Unsloth
19
 
 
 
 
20
 
 
21
 
22
+
23
+ Fine-tune [Qwen2-VL](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct), a Vision-Language model, to convert equation images into LaTeX code using the [Unsloth](https://github.com/unslothai/unsloth) framework and LoRA adapters.
24
+
25
+
26
+ ## Project Objective
27
+
28
+ Train an Equation-to-LaTeX transcriber using a pre-trained multimodal model. The model learns to read rendered math equations and generate corresponding LaTeX.
29
+
30
+
31
+ ![image/gif](https://cdn-uploads.huggingface.co/production/uploads/666c3d6489e21df7d4a02805/zVB5_lPq5v8EeHRbpSLtE.gif)
32
+
33
+
34
+ ---
35
+ [Source code on github ](https://github.com/Mayankpratapsingh022/Finetuning-LLMs/tree/main/Qwen_2_VL_Multimodel_LLM_Finetuning)
36
+
37
+ ## Dataset
38
+
39
+ - [`unsloth/LaTeX_OCR`](https://huggingface.co/datasets/unsloth/LaTeX_OCR) – Image-LaTeX pairs of printed mathematical expressions.
40
+ - ~68K train / 7K test samples.
41
+ - Example:
42
+ - Image: ![image](https://github.com/user-attachments/assets/e0d87582-7ba4-4e59-8f00-fd8f6c0f862d)
43
+ - Target: `R - { \frac { 1 } { 2 } } ( \nabla \Phi ) ^ { 2 } - { \frac { 1 } { 2 } } \nabla ^ { 2 } \Phi = 0 .`
44
+
45
+ ---
46
+
47
+ ## Tech Stack
48
+
49
+ | Component | Description |
50
+ |----------|-------------|
51
+ | Qwen2-VL | Multimodal vision-language model (7B) by Alibaba |
52
+ | Unsloth | Fast & memory-efficient training |
53
+ | LoRA (via PEFT) | Parameter-efficient fine-tuning |
54
+ | 4-bit Quantization | Enabled by `bitsandbytes` |
55
+ | Datasets, HF Hub | For loading/saving models & datasets |
56
+
57
+ ---
58
+
59
+ ## Setup
60
+
61
+ ```bash
62
+ pip install unsloth unsloth_zoo peft trl datasets accelerate bitsandbytes xformers==0.0.29.post3 sentencepiece protobuf hf_transfer triton
63
+ ```
64
+
65
+ ---
66
+
67
+ ## Training (Jupyter Notebook)
68
+
69
+ Refer to: `Qwen2__VL_image_to_latext.ipynb`
70
+
71
+ Steps:
72
+ 1. Load Qwen2-VL (`load_in_4bit=True`)
73
+ 2. Load dataset via `datasets.load_dataset("unsloth/LaTeX_OCR")`
74
+ 3. Apply LoRA adapters
75
+ 4. Use `SFTTrainer` from Unsloth to fine-tune
76
+ 5. Save adapters or merged model
77
+
78
+ LoRA rank used: `r=16`
79
+ LoRA alpha: `16`
80
+
81
+ ---
82
+
83
+ ## Inference
84
+
85
+ ```python
86
+ from PIL import Image
87
+ image = Image.open("equation.png")
88
+ prompt = "Write the LaTeX representation for this image."
89
+ inputs = tokenizer(image, tokenizer.apply_chat_template([("user", prompt)], add_generation_prompt=True), return_tensors="pt").to("cuda")
90
+ output = model.generate(**inputs, max_new_tokens=128)
91
+ print(tokenizer.decode(output[0], skip_special_tokens=True))
92
+ ```
93
+
94
+ ---
95
+
96
+ ## Evaluation
97
+
98
+ - Exact Match Accuracy: ~90%+
99
+ - Strong generalization to complex equations and symbols
100
+
101
+ ---
102
+
103
+ ## Results
104
+
105
+ | Metric | Value |
106
+ |------------------|---------------|
107
+ | Exact Match | ~90–92% |
108
+ | LoRA Params | ~<1% of model |
109
+ | Training Time | ~20–40 mins on A100 |
110
+ | Model Size | 7B (4-bit) |
111
+
112
+ ---
113
+
114
+ ## Future Work
115
+
116
+ - Extend to handwritten formulas (e.g., CROHME dataset)
117
+ - Add LaTeX syntax validation or auto-correction
118
+ - Build a lightweight Gradio/Streamlit interface for demo
119
+
120
+ ---
121
+
122
+ ## Folder Structure
123
+
124
+ ```
125
+ .
126
+ β”œβ”€β”€ Qwen2__VL_image_to_latext.ipynb # Training Notebook
127
+ β”œβ”€β”€ output/ # Saved fine-tuned model
128
+ └── README.md
129
+ ```
130
+
131
+ ---