oddadmix commited on
Commit
0f320df
·
verified ·
1 Parent(s): 6f75616

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -33
README.md CHANGED
@@ -15,45 +15,62 @@ licence: license
15
  This model is a fine-tuned version of [unsloth/qwen2-vl-2b-instruct-unsloth-bnb-4bit](https://huggingface.co/unsloth/qwen2-vl-2b-instruct-unsloth-bnb-4bit).
16
  It has been trained using [TRL](https://github.com/huggingface/trl).
17
 
18
- ## Quick start
19
-
20
- ```python
21
- from transformers import pipeline
22
-
23
- question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
24
- generator = pipeline("text-generation", model="oddadmix/Khanandeh-0.1-Persian-OCR-2B-Instruct", device="cuda")
25
- output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
26
- print(output["generated_text"])
27
  ```
28
 
29
- ## Training procedure
30
-
31
-
32
-
33
-
34
- This model was trained with SFT.
35
 
36
- ### Framework versions
37
 
38
- - TRL: 0.14.0
39
- - Transformers: 4.49.0
40
- - Pytorch: 2.4.1
41
- - Datasets: 3.4.1
42
- - Tokenizers: 0.21.1
43
 
44
- ## Citations
 
 
 
 
 
 
 
45
 
 
 
46
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47
 
48
- Cite TRL as:
49
-
50
- ```bibtex
51
- @misc{vonwerra2022trl,
52
- title = {{TRL: Transformer Reinforcement Learning}},
53
- author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
54
- year = 2020,
55
- journal = {GitHub repository},
56
- publisher = {GitHub},
57
- howpublished = {\url{https://github.com/huggingface/trl}}
58
- }
59
  ```
 
15
  This model is a fine-tuned version of [unsloth/qwen2-vl-2b-instruct-unsloth-bnb-4bit](https://huggingface.co/unsloth/qwen2-vl-2b-instruct-unsloth-bnb-4bit).
16
  It has been trained using [TRL](https://github.com/huggingface/trl).
17
 
18
+ You can load this model using the `transformers` and `qwen_vl_utils` library:
19
+ ```
20
+ !pip install transformers qwen_vl_utils accelerate>=0.26.0 PEFT -U
21
+ !pip install -U bitsandbytes
 
 
 
 
 
22
  ```
23
 
24
+ ```python
25
+ from PIL import Image
26
+ from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
27
+ import torch
28
+ import os
29
+ from qwen_vl_utils import process_vision_info
30
 
 
31
 
 
 
 
 
 
32
 
33
+ model_name = "oddadmix/Khanandeh-0.1-Persian-OCR-2B-Instruct"
34
+ model = Qwen2VLForConditionalGeneration.from_pretrained(
35
+ model_name,
36
+ torch_dtype="auto",
37
+ device_map="auto"
38
+ )
39
+ processor = AutoProcessor.from_pretrained(model_name)
40
+ max_tokens = 2000
41
 
42
+ prompt = "Below is the image of one page of a document, as well as some raw textual content that was previously extracted for it. Just return the plain text representation of this document as if you were reading it naturally. Do not hallucinate."
43
+ image.save("image.png")
44
 
45
+ messages = [
46
+ {
47
+ "role": "user",
48
+ "content": [
49
+ {"type": "image", "image": f"file://{src}"},
50
+ {"type": "text", "text": prompt},
51
+ ],
52
+ }
53
+ ]
54
+ text = processor.apply_chat_template(
55
+ messages, tokenize=False, add_generation_prompt=True
56
+ )
57
+ image_inputs, video_inputs = process_vision_info(messages)
58
+ inputs = processor(
59
+ text=[text],
60
+ images=image_inputs,
61
+ videos=video_inputs,
62
+ padding=True,
63
+ return_tensors="pt",
64
+ )
65
+ inputs = inputs.to("cuda")
66
+ generated_ids = model.generate(**inputs, max_new_tokens=max_tokens)
67
+ generated_ids_trimmed = [
68
+ out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
69
+ ]
70
+ output_text = processor.batch_decode(
71
+ generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
72
+ )[0]
73
+ os.remove(src)
74
+ print(output_text)
75
 
 
 
 
 
 
 
 
 
 
 
 
76
  ```