number of experts +
#7
by
Danioken
- opened
I have noticed that the model performs much better with 10 or even 12 active experts. The impact on performance is not significant 57 t/s vs 51 t/s for 8 vs 12 active experts. With more experts the model copes much better in the conversation and takes better care of syntax, style, format. But I think the precision in counting drops slightly - that's the impression I have.