Update README.md
Browse files
README.md
CHANGED
@@ -11,6 +11,8 @@ This is a SIGNIFICANTLY cool outcome. I widened Qwen3-32B. And it's still perf
|
|
11 |
|
12 |
This is an intermediate checkpoint in the process of expanding Qwen3-32B to match Qwen3-72B architecture dimensions. This model represents Stage 1 of a two-stage upscaling process, where the hidden dimensions and attention heads have been expanded, but the model still maintains 64 layers.
|
13 |
|
|
|
|
|
14 |
**⚠️ Note: This is an intermediate checkpoint not intended for direct use. For the complete model, use Qwen3-72B-Embiggened.**
|
15 |
|
16 |
## Architecture Changes
|
|
|
11 |
|
12 |
This is an intermediate checkpoint in the process of expanding Qwen3-32B to match Qwen3-72B architecture dimensions. This model represents Stage 1 of a two-stage upscaling process, where the hidden dimensions and attention heads have been expanded, but the model still maintains 64 layers.
|
13 |
|
14 |
+
the code is here: [stage1_v2.py]
|
15 |
+
|
16 |
**⚠️ Note: This is an intermediate checkpoint not intended for direct use. For the complete model, use Qwen3-72B-Embiggened.**
|
17 |
|
18 |
## Architecture Changes
|