Zero-Shot Image Classification
Transformers
Safetensors
siglip
vision

Error loading the SigLIP2 Vision model

#12
by jgaubil - opened

Description

Running the code snippet provided in the documentation for Siglip2VisionModel yields a RuntimeError due to shape mismatch when loading the model checkpoint.

Steps to Reproduce

  1. Install transformers library
  2. Run the following code:
from transformers import Siglip2VisionModel

model = Siglip2VisionModel.from_pretrained("google/siglip2-base-patch16-224")

Expected Behavior

The model should load successfully without errors.

Actual Behavior

The following error is raised:

You are using a model of type siglip_vision_model to instantiate a model of type siglip2_vision_model. 
This is not supported for all configurations of models and can yield errors.

[...]

RuntimeError: Error(s) in loading state_dict for Linear: 
size mismatch for weight: copying a param with shape torch.Size([768, 3, 16, 16]) 
from checkpoint, the shape in current model is torch.Size([768, 768]). 

Additional Investigation

Loading a SigLIP2 model from checkpoint effectively attempts to load the model using the wrong class, using SigLIP classes instead of SigLIP2:

from transformers import AutoModel

model = AutoModel.from_pretrained("google/siglip2-base-patch16-224")
model.vision_model.__class__
# Output: transformers.models.siglip.modeling_siglip.SiglipVisionTransformer

Root Cause Analysis

I believe this is because this checkpoint, as well as most other SigLIP2 checkpoints, are not defined in src/transformers/models/siglip2/convert_siglip2_to_hf.py but rather in src/transformers/models/siglip/convert_siglip_to_hf.py.

Proposed Solution

Porting the SigLIP2 checkpoints to the SigLIP2 conversion file may fix the error.

Environment

  • transformers version: 4.52.4
  • PyTorch version: 2.6.0
  • Python version: 3.11.11
  • Operating System: linux

Additional Context

This affects the google/siglip2-base-patch16-224 checkpoint and affects all checkpoints except google/siglip2-base-patch16-naflexand google/siglip2-so400m-patch16-naflex, that are correctly defined in src/transformers/models/siglip/convert_siglip_to_hf.py.

Sign up or log in to comment