Edit model card

This is a dumb experiment - don't expect it to be good!

I merged a few Mixtral models together then tuned only the routing parameters. There was a pretty steep drop in loss with only a bit of training - went from ~0.99 to ~.7 over about ten million tokens.

I'm hoping this after-the-fact balancing will have reduced some of the nasty behavior typical of current tunes. But maybe it just made it even dumber! We'll see.

Uses ChatML format.

Will update with more details if it turns out promising.

Downloads last month
1,193
Safetensors
Model size
46.7B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train chargoddard/mixtralmerge-8x7B-rebalanced-test