File size: 6,540 Bytes
2aa3aa4
 
 
 
 
c20c304
 
 
 
 
 
2aa3aa4
a5fc16a
2aa3aa4
a5fc16a
2aa3aa4
a5fc16a
2aa3aa4
a5fc16a
c20c304
a5fc16a
 
 
 
 
 
 
 
 
 
 
 
c20c304
d0fe8a1
c20c304
 
44e4123
 
 
 
 
 
a5fc16a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c20c304
9f6505d
c20c304
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a5fc16a
c20c304
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2aa3aa4
c20c304
 
 
 
 
2aa3aa4
c20c304
 
2aa3aa4
c20c304
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
---
library_name: transformers
license: apache-2.0
base_model:
- allenai/Molmo-7B-D-0924
base_model_relation: quantized
tags:
  - bitsandbytes
  - Molmo
  - chat
  - multimodal
---
Quantization of the [original Molmo-7B-D-0924](https://huggingface.co/allenai/Molmo-7B-D-0924) model using ```bitsandbytes```.

This model differs from the [one located here](https://huggingface.co/cyan2k/molmo-7B-D-bnb-4bit) in that it includes modified source code to reduce dependencies to achieve the same results and works out of the box.

# NOTE:

The example script below requires an Nvidia GPU and that you `pip install` the CUDA libraries into your virtual environment.  This is NOT NECESSARY if you plan to install CUDA on a systemwide basis (as most people do).  If you install CUDA systemwide, simply remove the ```set_cuda_paths``` function from the example script, but make sure that you've installed a proper version of CUDA and a compatible version of the Pytorch libraries.

<details><summary>COMPATIBLE CUDA AND PYTORCH 2.2.2 VERSIONS</summary>

Pytorch is only tested with specific versions of CUDA.  When using pytorch 2.2.2, the following CUDA versions are required:

- ```pip install nvidia-cublas-cu12==12.1.3.1```
- ```pip install nvidia-cuda-runtime-cu12==12.1.105```
- ```pip install nvidia-cuda-nvrtc-cu12==12.1.105```
- ```pip install nvidia-cufft-cu12==11.0.2.54```
- ```pip install nvidia-cudnn-cu12==8.9.2.26```
-  Then install [`torch==2.2.2`](https://download.pytorch.org/whl/cu121/torch/), [`torchvision==0.17`](https://download.pytorch.org/whl/cu121/torchvision/), and [`torchaudio==2.2.2`](https://download.pytorch.org/whl/cu121/torchaudio/) by visiting each of these three links and creating a `pip install` command based on the link for your Python version and platform.

For example, for Windows using Python 3.11 you would use the following:

```
pip install https://download.pytorch.org/whl/cu121/torch-2.2.2%2Bcu121-cp311-cp311-win_amd64.whl#sha256=efbcfdd4399197d06b32f7c0e1711c615188cdd65427b933648c7478fb880b3f
```
```
pip install https://download.pytorch.org/whl/cu121/torchvision-0.17.2%2Bcu121-cp311-cp311-win_amd64.whl#sha256=10ad542aab6b47dbe73c441381986d50a7ed5021cbe01d593a14477ec1f067a0
```
```
pip install https://download.pytorch.org/whl/cu121/torchaudio-2.2.2%2Bcu121-cp311-cp311-win_amd64.whl#sha256=c7dee68cd3d2b889bab71d4a0c345bdc3ea2fe79a62b921a6b49292c605b6071
```
</details>

<details><summary>COMPATIBLE CUDA AND PYTORCH 2.5.1 VERSIONS</summary>

Pytorch is only tested with specific versions of CUDA.  When using pytorch 2.5.1, the following CUDA versions are required:

- ```pip install nvidia-cublas-cu12==12.4.5.8```
- ```pip install nvidia-cuda-runtime-cu12==12.4.127```
- ```pip install nvidia-cuda-nvrtc-cu12==12.4.127```
- ```pip install nvidia-cufft-cu12==11.2.1.3```
- ```pip install nvidia-cudnn-cu12==9.1.0.70```
-  Then install [`torch==2.5.1`](https://download.pytorch.org/whl/cu124/torch/), [`torchvision==0.20.1`](https://download.pytorch.org/whl/cu124/torchvision/), and [`torchaudio==2.5.1`](https://download.pytorch.org/whl/cu124/torchaudio/) by visiting each of these three links and creating a `pip install` command based on the link for your Python version and platform.

For example, for Windows using Python 3.11 you would use the following:

```
pip install https://download.pytorch.org/whl/cu124/torch-2.5.1%2Bcu124-cp311-cp311-win_amd64.whl#sha256=6c8a7003ef1327479ede284b6e5ab3527d3900c2b2d401af15bcc50f2245a59f
```
```
pip install https://download.pytorch.org/whl/cu124/torchvision-0.20.1%2Bcu124-cp311-cp311-win_amd64.whl#sha256=15796b453a99ed0f0cbc249d129685ddc88157310135fb3addaf738a15db5306
```
```
pip install https://download.pytorch.org/whl/cu124/torchaudio-2.5.1%2Bcu124-cp311-cp311-win_amd64.whl#sha256=b3d75f4e6efc5412fe78c7f2787ee4f39cea1317652e1a47785879cde109f5c4
```
</details>


Example script (process single image):

```Python
import sys
import os
from pathlib import Path

def set_cuda_paths():
   venv_base = Path(sys.executable).parent.parent
   nvidia_base_path = venv_base / 'Lib' / 'site-packages' / 'nvidia'
   cuda_path = nvidia_base_path / 'cuda_runtime' / 'bin'
   cublas_path = nvidia_base_path / 'cublas' / 'bin'
   cudnn_path = nvidia_base_path / 'cudnn' / 'bin'
   nvrtc_path = nvidia_base_path / 'cuda_nvrtc' / 'bin'
   
   paths_to_add = [
       str(cuda_path),
       str(cublas_path),
       str(cudnn_path),
       str(nvrtc_path),
   ]
   env_vars = ['CUDA_PATH', 'PATH']
   
   for env_var in env_vars:
       current_value = os.environ.get(env_var, '')
       new_value = os.pathsep.join(paths_to_add + [current_value] if current_value else paths_to_add)
       os.environ[env_var] = new_value

set_cuda_paths()

import torch
from PIL import Image
from transformers import AutoModelForCausalLM, AutoProcessor, GenerationConfig

model_path = r"[INSERT THE PATH TO THE FOLDER HOLDING THE MODEL FILES HERE]"

class VisionModel:
   def __init__(self):
       self.model = None
       self.processor = None
       self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

   def initialize_model_and_processor(self):
       self.processor = AutoProcessor.from_pretrained(
           model_path,
           trust_remote_code=True,
           torch_dtype='auto',
           device_map='auto'
       )
       self.model = AutoModelForCausalLM.from_pretrained(
           model_path,
           trust_remote_code=True,
           torch_dtype='auto',
           device_map='auto'
       )
       
   def process_single_image(self, image_path):
       image = Image.open(image_path)
       if image.mode != "RGB":
           image = image.convert("RGB")
       text = "Describe this image in detail as possible but be succinct and don't repeat yourself."

       inputs = self.processor.process(images=[image], text=text)
       inputs = {k: v.to(self.device).unsqueeze(0) for k, v in inputs.items()}

       output = self.model.generate_from_batch(
           inputs,
           GenerationConfig(max_new_tokens=500, stop_strings=["<|endoftext|>"]),
           tokenizer=self.processor.tokenizer
       )

       generated_text = self.processor.tokenizer.decode(output[0, inputs['input_ids'].size(1):], skip_special_tokens=True)
       print(f"\nGenerated Text:\n{generated_text}\n")

if __name__ == "__main__":
   image_path = r"[INSERT THE PATH TO THE IMAGE YOU WANT TO PROCESS HERE]"
   vision_model = VisionModel()
   vision_model.initialize_model_and_processor()
   vision_model.process_single_image(image_path)
```