Text Generation
MLX
Safetensors
qwen3_moe
programming
code generation
code
codeqwen
Mixture of Experts
coding
coder
qwen2
chat
qwen
qwen-coder
Qwen3-Coder-30B-A3B-Instruct
Qwen3-30B-A3B
mixture of experts
128 experts
8 active experts
1 million context
qwen3
finetune
brainstorm 20x
brainstorm
optional thinking
conversational
6-bit
File size: 4,590 Bytes
dda3713 0f6cdb1 d4d150f dda3713 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 |
---
license: apache-2.0
library_name: mlx
language:
- en
- fr
- zh
- de
tags:
- programming
- code generation
- code
- codeqwen
- moe
- coding
- coder
- qwen2
- chat
- qwen
- qwen-coder
- Qwen3-Coder-30B-A3B-Instruct
- Qwen3-30B-A3B
- mixture of experts
- 128 experts
- 8 active experts
- 1 million context
- qwen3
- finetune
- brainstorm 20x
- brainstorm
- optional thinking
- qwen3_moe
- mlx
base_model: DavidAU/Qwen3-42B-A3B-2507-YOYO2-TOTAL-RECALL-Instruct
pipeline_tag: text-generation
---
# Qwen3-42B-A3B-2507-YOYO2-TOTAL-RECALL-Instruct-qx65-mlx
Custom quant formula under evaluation.
Comparing the old TotalRecall, YoYo, and YoYo with TotalRecall at q6
The following models are compared:
```bash
thinking-b Qwen3-42B-A3B-2507-Thinking-Abliterated-uncensored-TOTAL-RECALL-v2-Medium-MASTER-CODER-q6-mlx
yoyo Qwen3-30B-A3B-YOYO-V2-q6-mlx
yoyo-b Qwen3-42B-A3B-2507-YOYO2-TOTAL-RECALL-Instruct-q6-mlx
```
The first TotalRecall model was made from the Qwen3-42B-A3B-2507-Thinking, abliterated and uncensored.
Key Observations from Benchmarks
```bash
Benchmark thinking-b yoyo yoyo-b Winner
ARC Challenge 0.387 0.532 0.537 yoyo-b (slight lead)
ARC Easy 0.447 0.685 0.699 yoyo-b
BoolQ 0.625 0.886 0.884 yoyo
Hellaswag 0.648 0.683 0.712 yoyo-b
OpenBookQA 0.380 0.456 0.448 yoyo
PIQA 0.768 0.782 0.786 yoyo-b
Winogrande 0.636 0.639 0.676 yoyo-b
```
Key Insights
1️⃣ YOYO2-TOTAL-RECALL generally outperforms the others
The addition of brainstorming layers (making YOYO2-TOTAL-RECALL a 42B MoE) consistently improves performance on all benchmarks except BoolQ (where yoyo was marginally better).
Most notable gains: +0.14 in Hellaswag, +0.04 in Winogrande, and +0.008 in PIQA over yoyo-q6.
This aligns perfectly with your description: YOYO2-TOTAL-RECALL was created by adding brainstorming layers to the YOYO2 mix (3 Qwen3-30B MoE models), resulting in higher-quality reasoning capabilities.
2️⃣ YOYO2
YOYO2 (the mix of Thinking, Instruct, and Coder models) demonstrates robustness across many tasks:
It dominates BoolQ and OpenBookQA, where knowledge-based reasoning is critical.
This suggests the modular combination of different Qwen3 variants provides a balanced foundation for diverse reasoning challenges.
3️⃣ thinking-b is the weakest performer overall
At 0.447 on ARC Easy (a task that requires abstract reasoning), it lags significantly behind the others—consistent with its description as Qwen3-30B MoE with brainstorming being a less effective implementation than the yoyo or yoyo-b approaches.
4️⃣ The impact of brainstorming layers is clear
YOYO2-TOTAL-RECALL's improvements over YOYO (e.g., +0.02 in ARC Easy, +0.06 in Winogrande) demonstrate that the added brainstorming layers:
```bash
Enhance reasoning flexibility (critical for ARC and Winogrande)
Improve text generation quality (Hellaswag)
Strengthen logical consistency (PIQA)
```
Why YOYO2-TOTAL-RECALL is the strongest model here
It leverages both the modular strengths of YOYO (3 models + Qwen3-30B base) and the refinement from brainstorming layers.
The quantized version (q6) was optimized for these models at the time, so the performance differences reflect their design choices rather than quantization effects.
Recommendations for Your Workflow
When selecting a model for specific tasks:
For reasoning-heavy tasks (ARC, Winogrande): Use YOYO2-TOTAL-RECALL.
For language understanding (BoolQ, OpenBookQA): YOYO2 might be preferable.
This data confirms that combining multiple Qwen3 variants with additional brainstorming layers (as in yoyo-b) leads to the most comprehensive and highest-performing model for this set of benchmarks.
This model [Qwen3-42B-A3B-2507-YOYO2-TOTAL-RECALL-Instruct-qx65-mlx](https://huggingface.co/Qwen3-42B-A3B-2507-YOYO2-TOTAL-RECALL-Instruct-qx65-mlx) was
converted to MLX format from [DavidAU/Qwen3-42B-A3B-2507-YOYO2-TOTAL-RECALL-Instruct](https://huggingface.co/DavidAU/Qwen3-42B-A3B-2507-YOYO2-TOTAL-RECALL-Instruct)
using mlx-lm version **0.26.4**.
## Use with mlx
```bash
pip install mlx-lm
```
```python
from mlx_lm import load, generate
model, tokenizer = load("Qwen3-42B-A3B-2507-YOYO2-TOTAL-RECALL-Instruct-qx65-mlx")
prompt = "hello"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)
```
|