This merge makes sense

#1
by sometimesanotion - opened

With more deepseek distills appearing, there are a lot of ways forward you could take with Lamarck v0.7. This is a very straightforward direction which involves models that were already present in Lamarck's lineage, which bodes well for it being well-integrated. It also doesn't have to tackle the complexities of any more CoT influences than were already present, so this is definitely worth trying.

Yeah, I expect the integration to go smoothly as it fits well with Lamarck's current structure. Also, avoiding the extra complexity of CoT influence is a major advantage. Thanks for your support!

suayptalha pinned discussion
suayptalha unpinned discussion

There is some CoT in Lamarck v0.7 to be clear - it has DRT and R1, and those are in Chocolatine which in is in Qwenvergence v12 as a merge member and a small LoRA on other members. I think DRT will be muted in this merge, and R1's influence a bit intact. My bet's on a mild drop in IFEVAL, a high MUSR, and the rest near to the middle of its two ancestors.

suayptalha pinned discussion

Oh wow, I was wrong! Versus the average of its component models, IFEVAL got a boost, BBH climbed a little, and while MATH is getting a boost on the leaderboard for a lot of models, this one is strong. It suggests that Lamarkck and Qwenvergence-v12-Prose-DS are very synergistic in the middle layers. The drop in GPQA and MUSR is even more of a surprise. I think the self_attn and mlp filters you've used are responsible for this result.

This comment has been hidden

Sign up or log in to comment