Wrong configs

#5
by pere - opened

There are a few issues loading the Gemma3 model with AutoModelForCausalLM. The core problem is that the current config.json is set up for multi-modal usage (with "text_config" and "vision_config") but is missing key text fields at the top level (like "vocab_size" and "hidden_size") that the text-only classes look for. Specifically:
• There is no "vocab_size" field, yet the checkpoint’s embedding matrix is sized [262208, hidden_size] (because it has extra tokens for images).
• The text fields are nested under "text_config", but Gemma3ForCausalLM expects them at the top level (like config.hidden_size, config.num_hidden_layers, etc.).
• The uploaded config references "Gemma3ForConditionalGeneration", implying multi-modal usage. But for text-only usage, we must patch the config ourselves to match the real embedding dimension and top-level text fields.

Potential fixes:
1. Add text fields at the top level (e.g. "hidden_size": 2560, "vocab_size": 262208, etc.) so that AutoModelForCausalLM can read them directly without error.
2. Use a multi-modal class such as Gemma3ForConditionalGeneration that explicitly handles both text_config and vision_config if that’s the intended usage.

Fixing this manually shows that the model should load fine if this is addressed:

import torch
from transformers import (
    AutoConfig,
    AutoTokenizer,
    pipeline
)
from transformers.models.gemma3.configuration_gemma3 import Gemma3TextConfig
from transformers.models.gemma3.modeling_gemma3 import Gemma3ForCausalLM

# Name or local path of the Gemma3 model checkpoint
model_name = "google/gemma-3-4b-pt"

# Load the multi-modal config
multi_config = AutoConfig.from_pretrained(model_name)

# Extract the text-specific config to a dict
text_cfg_dict = multi_config.text_config.to_dict()

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Ensure the vocab size matches the checkpoint's embedding shape
#    (the checkpoint has embed_tokens.weight of size [262208, 2560], so we set 262208).
text_cfg_dict["vocab_size"] = 262208

# Add any special token IDs from the tokenizer
if tokenizer.pad_token_id is not None:
    text_cfg_dict["pad_token_id"] = tokenizer.pad_token_id
text_cfg_dict["bos_token_id"] = tokenizer.bos_token_id
text_cfg_dict["eos_token_id"] = tokenizer.eos_token_id

# Build a text-only config
text_config = Gemma3TextConfig(**text_cfg_dict)

# Load the model using that text config
model = Gemma3ForCausalLM.from_pretrained(
    model_name,
    config=text_config,
    torch_dtype=torch.bfloat16,
    device_map=None,
    low_cpu_mem_usage=False,
)

# Create a text-generation pipeline
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    device=0 if torch.cuda.is_available() else -1
)

prompt = "Eiffel tower is located in"
output = pipe(prompt, max_new_tokens=50)
print("Generated text:", output[0]["generated_text"])
```

Same problem here

Thank you for providing the issue detail. To help us investigate, Could you please let us know which Transformers version you were using when you encountered this error?

We can confirm that this issue has been addressed and resolved in Transformers 4.53.0.

Please try again by installing the latest transformers version (4.53.0) using !pip install -U transformersand you can load the gemma-3-4b-pt model using following code-

from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("google/gemma-3-4b-pt")

Please let us know if the issue still persists. Thank you.

Sign up or log in to comment