Update README.md
Browse files
README.md
CHANGED
@@ -14,7 +14,7 @@ tags:
|
|
14 |
|
15 |
## Model Description
|
16 |
|
17 |
-
This model was created by distilling the
|
18 |
|
19 |
The purpose of this distill is to make the Webgen-4B-Preview model gain some of the knowledge of a large MoE model to improve its overall performance. This model should perform better for web design but it is still a 4B model
|
20 |
**It is reccomended to use bf16 as its still only 8gb and because small models are very sensitive to quantization. For optimal results be specific in your prompting and avoid vaugue ambiguous prompts like "Create a website for a taco restaurant". Instead use prompts like "Make a single-file landing page for "RasterFlow" (GPU video pipeline).
|
|
|
14 |
|
15 |
## Model Description
|
16 |
|
17 |
+
This model was created by distilling the Qwen3-Coder-480B Mixture-of-Experts (MoE) teacher model into the compact and efficient **Tesslate/WEBGEN-4B-Preview** base.
|
18 |
|
19 |
The purpose of this distill is to make the Webgen-4B-Preview model gain some of the knowledge of a large MoE model to improve its overall performance. This model should perform better for web design but it is still a 4B model
|
20 |
**It is reccomended to use bf16 as its still only 8gb and because small models are very sensitive to quantization. For optimal results be specific in your prompting and avoid vaugue ambiguous prompts like "Create a website for a taco restaurant". Instead use prompts like "Make a single-file landing page for "RasterFlow" (GPU video pipeline).
|