Simply out of curiosity.

#2
by 1TBGPU4EVR - opened

Why did you wall this MOE model considering you didn't Wall the other 50+ you've abliterated?
Thanks. Great models. I wish you kept the vision transformer in the VL models though :) Some of those would make killer inference agents.

This is just an attempt with one idea, and we also tried other MoE models, but only Qwen3MoE performed the best.

This can activate either one expert or multiple experts simultaneously, which is different from the activation method of Qwen3MoE. The parameters for simultaneous activation can be adjusted by referring to huihui-ai/Huihui-MoE-23B-A4B-abliterated

Sign up or log in to comment