This merge makes sense

pinned

by sometimesanotion - opened Feb 6

Feb 6

•

With more deepseek distills appearing, there are a lot of ways forward you could take with Lamarck v0.7. This is a very straightforward direction which involves models that were already present in Lamarck's lineage, which bodes well for it being well-integrated. It also doesn't have to tackle the complexities of any more CoT influences than were already present, so this is definitely worth trying.

suayptalha

Owner Feb 7

Yeah, I expect the integration to go smoothly as it fits well with Lamarck's current structure. Also, avoiding the extra complexity of CoT influence is a major advantage. Thanks for your support!

suayptalha pinned discussion Feb 7

suayptalha unpinned discussion Feb 7

sometimesanotion

Feb 7

•

edited Feb 7

There is some CoT in Lamarck v0.7 to be clear - it has DRT and R1, and those are in Chocolatine which in is in Qwenvergence v12 as a merge member and a small LoRA on other members. I think DRT will be muted in this merge, and R1's influence a bit intact. My bet's on a mild drop in IFEVAL, a high MUSR, and the rest near to the middle of its two ancestors.

suayptalha pinned discussion Feb 7

sometimesanotion

Feb 15

•

edited Feb 15

Oh wow, I was wrong! Versus the average of its component models, IFEVAL got a boost, BBH climbed a little, and while MATH is getting a boost on the leaderboard for a lot of models, this one is strong. It suggests that Lamarkck and Qwenvergence-v12-Prose-DS are very synergistic in the middle layers. The drop in GPQA and MUSR is even more of a surprise. I think the self_attn and mlp filters you've used are responsible for this result.

sthenno

Feb 15

This comment has been hidden

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment