Update README.md
Browse files
README.md
CHANGED
@@ -16,7 +16,7 @@ tags:
|
|
16 |
|
17 |
## Model Description
|
18 |
|
19 |
-
This model is a distilled version of **`Qwen/Qwen3-30B-A3B-
|
20 |
|
21 |
It is the result of applying a LoRA created via an SVD-based distillation pipeline, and then merging those weights into the base model. The core of this process was to transfer the nuanced knowledge from a **62-layer, 256-expert teacher model** into the more efficient **48-layer, 128-expert architecture** of the student model.
|
22 |
|
|
|
16 |
|
17 |
## Model Description
|
18 |
|
19 |
+
This model is a distilled version of **`Qwen/Qwen3-30B-A3B-Thinking`** designed to inherit the reasoning and behavioral characteristics of its much larger teacher model, **`deepseek-ai/DeepSeek-V3.1`**.
|
20 |
|
21 |
It is the result of applying a LoRA created via an SVD-based distillation pipeline, and then merging those weights into the base model. The core of this process was to transfer the nuanced knowledge from a **62-layer, 256-expert teacher model** into the more efficient **48-layer, 128-expert architecture** of the student model.
|
22 |
|