custom_text_proj initialization
Hi all, I've been working with the weights for custom_text_proj were not being loaded from the checkpoint and instead got initialized randomly, leading to very different results at each model instantiation.
To address this, I solved the problem by leveraging a base model provided by the Vidore team (https://huggingface.co/vidore/colqwen2.5-base).
Do you recommend this solution, or is there a better alternative or a forthcoming update to ensure proper initialization for the projection layer within your model?
Thank you!
Hi @edesalve , thanks for the question.
We ran multiple evaluations previously, and the results were consistent across runs. I checked the initialization of the custom_text_proj layer, and for various runs, the standard deviation
, max
, and min
values remained the same. The only difference was in the mean
, which had a very small variation close to zero (e.g., 4.3869e-05, 6.5327e-05). This minor variation in the mean does not adversely affect model performance.
Nevertheless, if you prefer fully deterministic scores, yes you can use vidore/colqwen2.5-base
by changing the base_model_name_or_path
in adapter_config.json
. I evaluated using that model, and the scores across the benchmark remained consistent.