Pala Tej Deep's picture
9 1

Pala Tej Deep

Tej3

AI & ML interests

None yet

Recent Activity

Organizations

Walled AI's profile picture

Tej3's activity

replied to RishabhBhardwaj's post 10 months ago
view reply

The backbone refers to the pretrained model used as the base model for fine-tuning the expert model.

For example, in the case of Wizard Models:

  • WizardLM-13B and WizardMath-13B are both fine-tuned from the llama2-13B model. Therefore, they can be effectively merged using Della, Dare, or TIES because they share the same backbone.

  • On the other hand, WizardCoder-13B is fine-tuned from the CodeLlama-13B-Python model. Since WizardCoder uses a different base model (backbone) compared to WizardLM-13B and WizardMath-13B, merging these three models effectively using Della, Dare, or TIES is not feasible.

To summarize, the backbone is the underlying pretrained model that serves as the starting point for fine-tuning. It is crucial in the merging process because models fine-tuned from different backbones may not merge effectively due to the differences in their initial pretrained weights and configurations.

updated a Space over 1 year ago