BasedBase
/

WEBGEN-4B-Preview-480B-Distill

distilled-model

Model card Files Files and versions

BasedBase commited on 9 days ago

Commit

97a531b

·

verified ·

1 Parent(s): c0da93f

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -14,7 +14,7 @@ tags:
 ## Model Description
-This model was created by distilling the knowledge Qwen3-Coder-480B Mixture-of-Experts (MoE) teacher model into the compact and efficient **Tesslate/WEBGEN-4B-Preview** base.
 The purpose of this distill is to make the Webgen-4B-Preview model gain some of the knowledge of a large MoE model to improve its overall performance. This model should perform better for web design but it is still a 4B model
 **It is reccomended to use bf16 as its still only 8gb and because small models are very sensitive to quantization. For optimal results be specific in your prompting and avoid vaugue ambiguous prompts like "Create a website for a taco restaurant". Instead use prompts like "Make a single-file landing page for "RasterFlow" (GPU video pipeline).

 ## Model Description
+This model was created by distilling the Qwen3-Coder-480B Mixture-of-Experts (MoE) teacher model into the compact and efficient **Tesslate/WEBGEN-4B-Preview** base.
 The purpose of this distill is to make the Webgen-4B-Preview model gain some of the knowledge of a large MoE model to improve its overall performance. This model should perform better for web design but it is still a 4B model
 **It is reccomended to use bf16 as its still only 8gb and because small models are very sensitive to quantization. For optimal results be specific in your prompting and avoid vaugue ambiguous prompts like "Create a website for a taco restaurant". Instead use prompts like "Make a single-file landing page for "RasterFlow" (GPU video pipeline).