---
library_name: transformers
license: apache-2.0
base_model:
- allenai/Molmo-7B-D-0924
base_model_relation: quantized
tags:
  - bitsandbytes
  - Molmo
  - chat
  - multimodal
---
Quantization of the [original Molmo-7B-D-0924](https://huggingface.co/allenai/Molmo-7B-D-0924) model using ```bitsandbytes```.

This model differs from the [one located here](https://huggingface.co/cyan2k/molmo-7B-D-bnb-4bit) in that it includes modified source code to reduce dependencies to achieve the same results and works out of the box.

# NOTE:

The example script below requires an Nvidia GPU and that you `pip install` the CUDA libraries into your virtual environment.  This is NOT NECESSARY if you plan to install CUDA on a systemwide basis (as most people do).  If you install CUDA systemwide, simply remove the ```set_cuda_paths``` function from the example script, but make sure that you've installed a proper version of CUDA and a compatible version of the Pytorch libraries.

<details><summary>COMPATIBLE CUDA AND PYTORCH 2.2.2 VERSIONS</summary>

Pytorch is only tested with specific versions of CUDA.  When using pytorch 2.2.2, the following CUDA versions are required:

- ```pip install nvidia-cublas-cu12==12.1.3.1```
- ```pip install nvidia-cuda-runtime-cu12==12.1.105```
- ```pip install nvidia-cuda-nvrtc-cu12==12.1.105```
- ```pip install nvidia-cufft-cu12==11.0.2.54```
- ```pip install nvidia-cudnn-cu12==8.9.2.26```
-  Then install [`torch==2.2.2`](https://download.pytorch.org/whl/cu121/torch/), [`torchvision==0.17`](https://download.pytorch.org/whl/cu121/torchvision/), and [`torchaudio==2.2.2`](https://download.pytorch.org/whl/cu121/torchaudio/) by visiting each of these three links and creating a `pip install` command based on the link for your Python version and platform.

For example, for Windows using Python 3.11 you would use the following:

```
pip install https://download.pytorch.org/whl/cu121/torch-2.2.2%2Bcu121-cp311-cp311-win_amd64.whl#sha256=efbcfdd4399197d06b32f7c0e1711c615188cdd65427b933648c7478fb880b3f
```
```
pip install https://download.pytorch.org/whl/cu121/torchvision-0.17.2%2Bcu121-cp311-cp311-win_amd64.whl#sha256=10ad542aab6b47dbe73c441381986d50a7ed5021cbe01d593a14477ec1f067a0
```
```
pip install https://download.pytorch.org/whl/cu121/torchaudio-2.2.2%2Bcu121-cp311-cp311-win_amd64.whl#sha256=c7dee68cd3d2b889bab71d4a0c345bdc3ea2fe79a62b921a6b49292c605b6071
```
</details>

<details><summary>COMPATIBLE CUDA AND PYTORCH 2.5.1 VERSIONS</summary>

Pytorch is only tested with specific versions of CUDA.  When using pytorch 2.5.1, the following CUDA versions are required:

- ```pip install nvidia-cublas-cu12==12.4.5.8```
- ```pip install nvidia-cuda-runtime-cu12==12.4.127```
- ```pip install nvidia-cuda-nvrtc-cu12==12.4.127```
- ```pip install nvidia-cufft-cu12==11.2.1.3```
- ```pip install nvidia-cudnn-cu12==9.1.0.70```
-  Then install [`torch==2.5.1`](https://download.pytorch.org/whl/cu124/torch/), [`torchvision==0.20.1`](https://download.pytorch.org/whl/cu124/torchvision/), and [`torchaudio==2.5.1`](https://download.pytorch.org/whl/cu124/torchaudio/) by visiting each of these three links and creating a `pip install` command based on the link for your Python version and platform.

For example, for Windows using Python 3.11 you would use the following:

```
pip install https://download.pytorch.org/whl/cu124/torch-2.5.1%2Bcu124-cp311-cp311-win_amd64.whl#sha256=6c8a7003ef1327479ede284b6e5ab3527d3900c2b2d401af15bcc50f2245a59f
```
```
pip install https://download.pytorch.org/whl/cu124/torchvision-0.20.1%2Bcu124-cp311-cp311-win_amd64.whl#sha256=15796b453a99ed0f0cbc249d129685ddc88157310135fb3addaf738a15db5306
```
```
pip install https://download.pytorch.org/whl/cu124/torchaudio-2.5.1%2Bcu124-cp311-cp311-win_amd64.whl#sha256=b3d75f4e6efc5412fe78c7f2787ee4f39cea1317652e1a47785879cde109f5c4
```
</details>


Example script (process single image):

```Python
import sys
import os
from pathlib import Path

def set_cuda_paths():
   venv_base = Path(sys.executable).parent.parent
   nvidia_base_path = venv_base / 'Lib' / 'site-packages' / 'nvidia'
   cuda_path = nvidia_base_path / 'cuda_runtime' / 'bin'
   cublas_path = nvidia_base_path / 'cublas' / 'bin'
   cudnn_path = nvidia_base_path / 'cudnn' / 'bin'
   nvrtc_path = nvidia_base_path / 'cuda_nvrtc' / 'bin'
   
   paths_to_add = [
       str(cuda_path),
       str(cublas_path),
       str(cudnn_path),
       str(nvrtc_path),
   ]
   env_vars = ['CUDA_PATH', 'PATH']
   
   for env_var in env_vars:
       current_value = os.environ.get(env_var, '')
       new_value = os.pathsep.join(paths_to_add + [current_value] if current_value else paths_to_add)
       os.environ[env_var] = new_value

set_cuda_paths()

import torch
from PIL import Image
from transformers import AutoModelForCausalLM, AutoProcessor, GenerationConfig

model_path = r"[INSERT THE PATH TO THE FOLDER HOLDING THE MODEL FILES HERE]"

class VisionModel:
   def __init__(self):
       self.model = None
       self.processor = None
       self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

   def initialize_model_and_processor(self):
       self.processor = AutoProcessor.from_pretrained(
           model_path,
           trust_remote_code=True,
           torch_dtype='auto',
           device_map='auto'
       )
       self.model = AutoModelForCausalLM.from_pretrained(
           model_path,
           trust_remote_code=True,
           torch_dtype='auto',
           device_map='auto'
       )
       
   def process_single_image(self, image_path):
       image = Image.open(image_path)
       if image.mode != "RGB":
           image = image.convert("RGB")
       text = "Describe this image in detail as possible but be succinct and don't repeat yourself."

       inputs = self.processor.process(images=[image], text=text)
       inputs = {k: v.to(self.device).unsqueeze(0) for k, v in inputs.items()}

       output = self.model.generate_from_batch(
           inputs,
           GenerationConfig(max_new_tokens=500, stop_strings=["<|endoftext|>"]),
           tokenizer=self.processor.tokenizer
       )

       generated_text = self.processor.tokenizer.decode(output[0, inputs['input_ids'].size(1):], skip_special_tokens=True)
       print(f"\nGenerated Text:\n{generated_text}\n")

if __name__ == "__main__":
   image_path = r"[INSERT THE PATH TO THE IMAGE YOU WANT TO PROCESS HERE]"
   vision_model = VisionModel()
   vision_model.initialize_model_and_processor()
   vision_model.process_single_image(image_path)
```