LoRA/PEFT can’t target nn.Parameter vision→text projection in Gemma-3 VLM — why use nn.Parameter instead of nn.Linear(bias=False)?

#74
by alexanderyj - opened

Context
In Gemma-3 VLM (and similar VLMs), the vision→text projection that maps vision token embeddings to the LM hidden size is implemented as a bare nn.Parameter (used via matmul) rather than an nn.Linear(bias=False) module.

Issue
PEFT/LoRA can only inject adapters into nn.Modules (e.g., nn.Linear, nn.Conv2d, nn.Embedding). When the projection is a raw nn.Parameter, there’s no module for PEFT to hook into, so LoRA adapters can’t be attached to this very important transformation.

Questions for the maintainers
1.Is there a specific reason the projection is modeled as a raw nn.Parameter rather than nn.Linear(bias=False)?
2.Would you consider switching that projection to nn.Linear(bias=False)?
3.Do you have an official solution or recommended workaround for the current compatibility problem?

This comment has been hidden (marked as Off-Topic)

Sign up or log in to comment