PaliGemma 2 ONNX doesn't support object detection?
Hi, thanks for sharing the ONNX weights for PaliGemma 2. While it works well for image captioning, I tried several prompts for object detection using the detect keyword in the prompt.
Eg: detect person was one of the prompts, but the response was null.
Are the converted model weights compatible only with captioning tasks?
Hmm, it should work. Could you share the code you are using?
Also, can you confirm the original (pytorch) version works correctly for your image/prompt?
@Xenova
:Okay, after experimenting with various different prompts, I was able to get the bounding box coordinates. Unlike the original PaliGemma 2 weights where a simple <image>detect person
would work, I had to specifically provide this prompt <image>detect bounding box of person
to make it work.
Hi
@Xenova
, is it possible to run this using Vanilla JS by loading Transformers.js via a CDN?
I get the following error:
import { AutoProcessor, PaliGemmaForConditionalGeneration } from 'https://cdn.jsdelivr.net/npm/@huggingface/[email protected]';
Here's how I'm loading it.
How are you converting the model to onnx? Optimum is not supporting image-text-to-text task. Please help.
$optimum-cli export onnx --model google/paligemma-3b-pt-224 paligemma-3b-pt-224_onnx/
KeyError: “Unknown task: image-text-to-text.
I tried specifying one of the existing task image-to-text. But that throws also another error
$optimum-cli export onnx --model google/paligemma-3b-pt-224 --task image-to-text paligemma-3b-pt-224_onnx/
ValueError: Trying to export a paligemma model, that is a custom or unsupported architecture, but no custom onnx configuration was passed as custom_onnx_configs. Please refer to Export a model to ONNX with optimum.exporters.onnx for an example on how to export custom models. Please open an issue at GitHub · Where software is built if you would like the model type paligemma to be supported natively in the ONNX export.
paligemma2 uses a custom conversion script, which I have added here: https://github.com/huggingface/transformers.js/issues/1126#issuecomment-2575525385
Hope that helps!
@Xenova : I've commented on the GitHub issue about an error. Could you please check?
RuntimeError: The serialized model is larger than the 2GiB limit imposed by the protobuf library. Therefore the output file must be a file path, so that the ONNX external data can be written to the same directory. Please specify the output file name.
@Xenova Thanks . That helps.
@biswajitdevsarma : Did it work for you?
@NSTiwari Conversion to onnx worked. Haven't checked inference using onnx yet.
@biswajitdevsarma : Do you mind sharing the notebook? When I tried doing the same, I got the above error.
@NSTiwari
I used the above code
Just commented the onnx.slim part
Attempt to optimize the model with onnxslim
"""
try:
onnx_model = onnxslim.slim(temp_model_path)
except Exception as e:
print(f"Failed to slim {temp_model_path}: {e}")
onnx_model = onnx.load(temp_model_path)
"""
onnx_model = onnx.load(temp_model_path))
Everything else is same.
@biswajitdevsarma
I used the same code too. Maybe, I'm missing some dependencies or compatibility issues with versions. Here's my notebook. Could you please check once? Really appreciate your help.
@NSTiwari
I installed the following packages in python 3.10 environment
pip install -q --upgrade git+https://github.com/huggingface/transformers.git
pip install -q datasets lightning
pip install -q peft accelerate bitsandbytes
pip install -q --upgrade wandb
pip install Pillow
pip install tensorboardX
npm i @huggingface/transformers
The inference in code https://huggingface.co/onnx-community/paligemma2-3b-pt-224 doesn't work when I use a local model path
const model_path = './my_local_onnx_model';
Error: Unauthorized access to file: "https://huggingface.co/./my_local_onnx_model/resolve/main/preprocessor_config.json".
Any idea how to make it work with local model path?
@NSTiwari
I installed the following packages in python 3.10 environment
pip install -q --upgrade git+https://github.com/huggingface/transformers.git
pip install -q datasets lightning
pip install -q peft accelerate bitsandbytes
pip install -q --upgrade wandb
pip install Pillow
pip install tensorboardX
npm i @huggingface/transformers
@biswajitdevsarma
What about the ONNX libraries?
@NSTiwari
oh yes
pip install optimum[exporters]
pip install onnxslim
The inference code in https://huggingface.co/onnx-community/paligemma2-3b-pt-224 doesn't work when I use a local model path
const model_path = './my_local_onnx_model';
Error: Unauthorized access to file: "https://huggingface.co/./my_local_onnx_model/resolve/main/preprocessor_config.json".
Any idea how to make it work with local model path?
The inference code in https://huggingface.co/onnx-community/paligemma2-3b-pt-224 doesn't work when I use a local model path
const model_path = './my_local_onnx_model';Error: Unauthorized access to file: "https://huggingface.co/./my_local_onnx_model/resolve/main/preprocessor_config.json".
Any idea how to make it work with local model path?
@biswajitdevsarma : I can try at my end, but I'm not even able to convert to ONNX first. I still get the same error:
When running in Google Colab, remember to restart the runtime after making the change.
The error pointing to the setting of the global variable (instead of the lines below it) suggests this is the case.
Thanks,
@Xenova
. This partially worked. However, only the following 5 files are generated as opposed to all the other files from the official onnx-community repo:
Do I need to specify the quantization flag to get q8, q4, and fp16 files? If yes, where?
Or do I first need to quantize PaliGemma 2 using bnb and then go ahead with the normal ONNX conversion process?
The execution stopped with the below message:
/usr/local/lib/python3.11/dist-packages/transformers/models/gemma2/modeling_gemma2.py:625: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
attention_mask.shape[-1] if attention_mask.dim() == 2 else cache_position[-1].item()
/usr/local/lib/python3.11/dist-packages/transformers/models/gemma2/modeling_gemma2.py:640: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
normalizer = torch.tensor(self.config.hidden_size**0.5, dtype=hidden_states.dtype)
/usr/local/lib/python3.11/dist-packages/transformers/models/gemma2/modeling_gemma2.py:294: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
effective_seq_len = max(cache_position.shape[0], self.sliding_window)
Failed to slim output/google/paligemma2-3b-pt-224/temp/decoder_model_merged.onnx: Error parsing message
Failed to slim output/google/paligemma2-3b-pt-224/temp/embed_tokens.onnx: Error parsing message