This is a dumb experiment - don't expect it to be good!
I merged a few Mixtral models together then tuned only the routing parameters. There was a pretty steep drop in loss with only a bit of training - went from ~0.99 to ~.7 over about ten million tokens.
I'm hoping this after-the-fact balancing will have reduced some of the nasty behavior typical of current tunes. But maybe it just made it even dumber! We'll see.
Uses ChatML format.
Will update with more details if it turns out promising.
- Downloads last month
- 1,193
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.