AlphaGaO/DeepSeek-V3-0324-Fused-8E-39B-Unhealed-Preview

Hello there!

Thanks for the appreciation,

I haven't been able to run any benchmark yet, as the fused architecture are not compatible with inferences engines as of now and pure pytorch inference is far too slow;
So the only think i can say for sure is that even the Unhealed models are capable of generating coherent english (not an achievement, but not bad for a model that los 96% of its parameters).

I am currently running postraining on the 29B versions, i'll take a look into how to quantize and inference it with vllm and / or gguf after that (due to the size it will require a good tensor parallelism implementation to run effectively).

AlphaGaO
/

DeepSeek-V3-0324-Fused-8E-39B-Unhealed-Preview

Performance of the Fused Model