BasedBase
/

Qwen3-30B-A3B-Thinking-2507-Deepseek-v3.1-Distill

Mixture of Experts

mixture-of-experts

code-generation

Model card Files Files and versions

BasedBase commited on 10 days ago

Commit

8584bb8

·

verified ·

1 Parent(s): e6518a6

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -16,7 +16,7 @@ tags:
 ## Model Description
-This model is a distilled version of **`Qwen/Qwen3-30B-A3B-Instruct`** designed to inherit the reasoning and behavioral characteristics of its much larger teacher model, **`deepseek-ai/DeepSeek-V3.1`**.
 It is the result of applying a LoRA created via an SVD-based distillation pipeline, and then merging those weights into the base model. The core of this process was to transfer the nuanced knowledge from a **62-layer, 256-expert teacher model** into the more efficient **48-layer, 128-expert architecture** of the student model.

 ## Model Description
+This model is a distilled version of **`Qwen/Qwen3-30B-A3B-Thinking`** designed to inherit the reasoning and behavioral characteristics of its much larger teacher model, **`deepseek-ai/DeepSeek-V3.1`**.
 It is the result of applying a LoRA created via an SVD-based distillation pipeline, and then merging those weights into the base model. The core of this process was to transfer the nuanced knowledge from a **62-layer, 256-expert teacher model** into the more efficient **48-layer, 128-expert architecture** of the student model.