custom_text_proj initialization

#4
by edesalve - opened

Hi all, I've been working with the weights for custom_text_proj were not being loaded from the checkpoint and instead got initialized randomly, leading to very different results at each model instantiation.

To address this, I solved the problem by leveraging a base model provided by the Vidore team (https://huggingface.co/vidore/colqwen2.5-base).

Do you recommend this solution, or is there a better alternative or a forthcoming update to ensure proper initialization for the projection layer within your model?

Thank you!

Hi @edesalve , thanks for the question.

We ran multiple evaluations previously, and the results were consistent across runs. I checked the initialization of the custom_text_proj layer, and for various runs, the standard deviation, max, and min values remained the same. The only difference was in the mean, which had a very small variation close to zero (e.g., 4.3869e-05, 6.5327e-05). This minor variation in the mean does not adversely affect model performance.

Nevertheless, if you prefer fully deterministic scores, yes you can use vidore/colqwen2.5-base by changing the base_model_name_or_path in adapter_config.json. I evaluated using that model, and the scores across the benchmark remained consistent.

Sign up or log in to comment