How to create something similar for DeepSeek V3-0324?

#1
by bibproj - opened

@mmbela
Good evening Béla

An impressive result. I'm interested how you have created this model. I want to translate studies into ~40 languages using DeepSeek V3-0324. Translations seem to be very sensitive to quantization. Quants seem to forget other languages and the fluency of them very quickly. Your method of creating a model that maximally uses the available 512GB in this way makes complete sense.

Could you please give me some pointers of how to do something like this please?

Hi!

The idea came from tngtech/DeepSeek-R1T-Chimera. There were some codes shared in this closed discussion: https://huggingface.co/tngtech/DeepSeek-R1T-Chimera/discussions/1. My code is based on that after lot of modifications and it is working on different quants of the same model as well. I made mixed quants of V3-0324 as well and it was better for translation. I can share my python code but not now because I have no more time today.

Thank you. Appreciate the information!

@mmbela

The code snippets they provided in that discussion are working for GGUF. As this optimization is for a Mac Studio with 512 GB it could make sense to also try your brilliant idea on MLX models. In general, MLX seems to perform a bit better on a Mac than GGUF. I assume that the same merge process would work for MLX models.

Thinking about it, working with MLX might even make the process easier. Nowadays MLX can directly read the original FP8 DeepSeek models. It does a dequant on the inputs to bfloat16 in memory, and then also does the quantization. There is no need to go via a separate BF16 model as for GGUF. Instead of having to read from 3 quantized models (Q8, Q5, and Q4), it seems possible to read each layer from the original FP8 model, and decide how much to quantize each layer.

Looking through the MLX code with my limited knowledge, I see the convert.py calls quantize_model() in utils.py. One of the parameters is quant_predicate. It is described as "A callable that decides how to quantize each layer based on the path." So it seems that all that is required is a "callable" that returns what quantization level is required for each layer.
With this your idea can hopefully be implemented in MLX as well.

@mmbela
Quick question please: Is there a special reason for all the ffn_up/down/gate_exps layers in blocks 10-29 to be in Q8_0 instead of Q4_K, Q5_K, or Q6_K?

@mmbela
Quick question please: Is there a special reason for all the ffn_up/down/gate_exps layers in blocks 10-29 to be in Q8_0 instead of Q4_K, Q5_K, or Q6_K?

Because of the sources. I used there quant from unsloth as well. That quants are very good to find the optimal quants for different parts of the model.

I will start doing tomorrow mixed quants from the new V3.1 from unsloth again. My first choice will be unsloth xl q6 and q5 models for a new mix. I'll check my code by this and I will share after.

@mmbela
Hoping the following information will be useful to you:

Like you, I also looked at ways to find the optimal quants for different parts of the model. One good source is a new research paper at https://arxiv.org/abs/2505.02390 . They used DeepSeek R1-0528 and V3-0324 to build similar mixed quants. They created some Q3 GGUF models which they call DQ3. You can download the latest ones from ModelScope.cn as they are not yet on Hugging Face. The links are on https://github.com/UnicomAI/DeepSeek-Eval . I tested their DeepSeek-V3-0324-DQ3_K_M (GGUF) model on translations, and I'm impressed. Their Q3 is much better than the Q4 quants of DeepSeek-V3-0324 that I tested.

I wrote a small program to get the tensor (layers) data from their model, and pulled this metadata into Microsoft Excel. This allows me to calculate many variations before we build it. I then matched this with the layers from your DeepSeek-R1-0528-optimized-for-512Gb-GGUF model to make it easy to compare. They used Q3, Q4, and Q6 for their model. By changing this to Q5, Q6, and Q8 it starts matching to a nice size (441 GiB. Mac will show 473 GB on disk.)

I made one additional refinement. In a research paper at https://arxiv.org/abs/2411.07191 they talk about Super Weights which occur in the ffn_down_exps layers. If you "corrupt" these so-called super weights it degrades the quality of any model to that of an idiot. Because these only occur in the ffn_down_exps layers, I upgraded any such layers from Q5 to Q6 in the Excel sheet. This increases the size to 460 GiB. (Mac will show 494 GB on disk.) Your current model is 485 Gib (and Mac shows as 521 GB on disk.) It is slightly smaller than your current model, but will then allow for a bit larger context. Or not running on the edge, which MacOS does not always like.

Up to now this is mostly an Excel exercise. I now need to actually build the designed model.

This Hugging Face page does not allow me to upload the Excel sheet that you can also use it. If you wish to look at this, please let me know how I can send it to you.

output.weight,Q8_0
output_norm.weight,F32
token_embd.weight,Q6_K
blk.0.attn_kv_a_mqa.weight,Q8_0
blk.0.attn_kv_a_norm.weight,F32
blk.0.attn_k_b.weight,Q8_0
blk.0.attn_v_b.weight,Q8_0
blk.0.attn_norm.weight,F32
blk.0.attn_output.weight,Q6_K
blk.0.attn_q_a.weight,Q6_K
blk.0.attn_q_a_norm.weight,F32
blk.0.attn_q_b.weight,Q6_K
blk.0.ffn_down.weight,Q8_0
blk.0.ffn_gate.weight,Q6_K
blk.0.ffn_norm.weight,F32
blk.0.ffn_up.weight,Q6_K
blk.1.attn_kv_a_mqa.weight,Q8_0
blk.1.attn_kv_a_norm.weight,F32
blk.1.attn_k_b.weight,Q8_0
blk.1.attn_v_b.weight,Q8_0
blk.1.attn_norm.weight,F32
blk.1.attn_output.weight,Q6_K
blk.1.attn_q_a.weight,Q6_K
blk.1.attn_q_a_norm.weight,F32
blk.1.attn_q_b.weight,Q6_K
blk.1.ffn_down.weight,Q8_0
blk.1.ffn_gate.weight,Q6_K
blk.1.ffn_norm.weight,F32
blk.1.ffn_up.weight,Q6_K
blk.2.attn_kv_a_mqa.weight,Q8_0
blk.2.attn_kv_a_norm.weight,F32
blk.2.attn_k_b.weight,Q8_0
blk.2.attn_v_b.weight,Q8_0
blk.2.attn_norm.weight,F32
blk.2.attn_output.weight,Q6_K
blk.2.attn_q_a.weight,Q6_K
blk.2.attn_q_a_norm.weight,F32
blk.2.attn_q_b.weight,Q6_K
blk.2.ffn_down.weight,Q8_0
blk.2.ffn_gate.weight,Q6_K
blk.2.ffn_norm.weight,F32
blk.2.ffn_up.weight,Q6_K
blk.3.attn_kv_a_mqa.weight,Q8_0
blk.3.attn_kv_a_norm.weight,F32
blk.3.attn_k_b.weight,Q8_0
blk.3.attn_v_b.weight,Q8_0
blk.3.attn_norm.weight,F32
blk.3.attn_output.weight,Q6_K
blk.3.attn_q_a.weight,Q6_K
blk.3.attn_q_a_norm.weight,F32
blk.3.attn_q_b.weight,Q6_K
blk.3.exp_probs_b.bias,F32
blk.3.ffn_down_exps.weight,Q8_0
blk.3.ffn_down_shexp.weight,Q8_0
blk.3.ffn_gate_exps.weight,Q5_K
blk.3.ffn_gate_inp.weight,F32
blk.3.ffn_gate_shexp.weight,Q6_K
blk.3.ffn_norm.weight,F32
blk.3.ffn_up_exps.weight,Q5_K
blk.3.ffn_up_shexp.weight,Q6_K
blk.4.attn_kv_a_mqa.weight,Q8_0
blk.4.attn_kv_a_norm.weight,F32
blk.4.attn_k_b.weight,Q8_0
blk.4.attn_v_b.weight,Q8_0
blk.4.attn_norm.weight,F32
blk.4.attn_output.weight,Q6_K
blk.4.attn_q_a.weight,Q6_K
blk.4.attn_q_a_norm.weight,F32
blk.4.attn_q_b.weight,Q6_K
blk.4.exp_probs_b.bias,F32
blk.4.ffn_down_exps.weight,Q8_0
blk.4.ffn_down_shexp.weight,Q8_0
blk.4.ffn_gate_exps.weight,Q5_K
blk.4.ffn_gate_inp.weight,F32
blk.4.ffn_gate_shexp.weight,Q6_K
blk.4.ffn_norm.weight,F32
blk.4.ffn_up_exps.weight,Q5_K
blk.4.ffn_up_shexp.weight,Q6_K
blk.5.attn_kv_a_mqa.weight,Q8_0
blk.5.attn_kv_a_norm.weight,F32
blk.5.attn_k_b.weight,Q8_0
blk.5.attn_v_b.weight,Q8_0
blk.5.attn_norm.weight,F32
blk.5.attn_output.weight,Q6_K
blk.5.attn_q_a.weight,Q6_K
blk.5.attn_q_a_norm.weight,F32
blk.5.attn_q_b.weight,Q6_K
blk.5.exp_probs_b.bias,F32
blk.5.ffn_down_exps.weight,Q6_K
blk.5.ffn_down_shexp.weight,Q8_0
blk.5.ffn_gate_exps.weight,Q5_K
blk.5.ffn_gate_inp.weight,F32
blk.5.ffn_gate_shexp.weight,Q6_K
blk.5.ffn_norm.weight,F32
blk.5.ffn_up_exps.weight,Q5_K
blk.5.ffn_up_shexp.weight,Q6_K
blk.6.attn_kv_a_mqa.weight,Q8_0
blk.6.attn_kv_a_norm.weight,F32
blk.6.attn_k_b.weight,Q8_0
blk.6.attn_v_b.weight,Q8_0
blk.6.attn_norm.weight,F32
blk.6.attn_output.weight,Q6_K
blk.6.attn_q_a.weight,Q6_K
blk.6.attn_q_a_norm.weight,F32
blk.6.attn_q_b.weight,Q6_K
blk.6.exp_probs_b.bias,F32
blk.6.ffn_down_exps.weight,Q6_K
blk.6.ffn_down_shexp.weight,Q8_0
blk.6.ffn_gate_exps.weight,Q5_K
blk.6.ffn_gate_inp.weight,F32
blk.6.ffn_gate_shexp.weight,Q6_K
blk.6.ffn_norm.weight,F32
blk.6.ffn_up_exps.weight,Q5_K
blk.6.ffn_up_shexp.weight,Q6_K
blk.7.attn_kv_a_mqa.weight,Q8_0
blk.7.attn_kv_a_norm.weight,F32
blk.7.attn_k_b.weight,Q8_0
blk.7.attn_v_b.weight,Q8_0
blk.7.attn_norm.weight,F32
blk.7.attn_output.weight,Q6_K
blk.7.attn_q_a.weight,Q6_K
blk.7.attn_q_a_norm.weight,F32
blk.7.attn_q_b.weight,Q6_K
blk.7.exp_probs_b.bias,F32
blk.7.ffn_down_exps.weight,Q6_K
blk.7.ffn_down_shexp.weight,Q8_0
blk.7.ffn_gate_exps.weight,Q5_K
blk.7.ffn_gate_inp.weight,F32
blk.7.ffn_gate_shexp.weight,Q6_K
blk.7.ffn_norm.weight,F32
blk.7.ffn_up_exps.weight,Q5_K
blk.7.ffn_up_shexp.weight,Q6_K
blk.8.attn_kv_a_mqa.weight,Q8_0
blk.8.attn_kv_a_norm.weight,F32
blk.8.attn_k_b.weight,Q8_0
blk.8.attn_v_b.weight,Q8_0
blk.8.attn_norm.weight,F32
blk.8.attn_output.weight,Q6_K
blk.8.attn_q_a.weight,Q6_K
blk.8.attn_q_a_norm.weight,F32
blk.8.attn_q_b.weight,Q6_K
blk.8.exp_probs_b.bias,F32
blk.8.ffn_down_exps.weight,Q6_K
blk.8.ffn_down_shexp.weight,Q8_0
blk.8.ffn_gate_exps.weight,Q5_K
blk.8.ffn_gate_inp.weight,F32
blk.8.ffn_gate_shexp.weight,Q6_K
blk.8.ffn_norm.weight,F32
blk.8.ffn_up_exps.weight,Q5_K
blk.8.ffn_up_shexp.weight,Q6_K
blk.9.attn_kv_a_mqa.weight,Q8_0
blk.9.attn_kv_a_norm.weight,F32
blk.9.attn_k_b.weight,Q8_0
blk.9.attn_v_b.weight,Q8_0
blk.9.attn_norm.weight,F32
blk.9.attn_output.weight,Q6_K
blk.9.attn_q_a.weight,Q6_K
blk.9.attn_q_a_norm.weight,F32
blk.9.attn_q_b.weight,Q6_K
blk.9.exp_probs_b.bias,F32
blk.9.ffn_down_exps.weight,Q6_K
blk.9.ffn_down_shexp.weight,Q8_0
blk.9.ffn_gate_exps.weight,Q5_K
blk.9.ffn_gate_inp.weight,F32
blk.9.ffn_gate_shexp.weight,Q6_K
blk.9.ffn_norm.weight,F32
blk.9.ffn_up_exps.weight,Q5_K
blk.9.ffn_up_shexp.weight,Q6_K
blk.10.attn_kv_a_mqa.weight,Q8_0
blk.10.attn_kv_a_norm.weight,F32
blk.10.attn_k_b.weight,Q8_0
blk.10.attn_v_b.weight,Q8_0
blk.10.attn_norm.weight,F32
blk.10.attn_output.weight,Q6_K
blk.10.attn_q_a.weight,Q6_K
blk.10.attn_q_a_norm.weight,F32
blk.10.attn_q_b.weight,Q6_K
blk.10.exp_probs_b.bias,F32
blk.10.ffn_down_exps.weight,Q6_K
blk.10.ffn_down_shexp.weight,Q8_0
blk.10.ffn_gate_exps.weight,Q5_K
blk.10.ffn_gate_inp.weight,F32
blk.10.ffn_gate_shexp.weight,Q6_K
blk.10.ffn_norm.weight,F32
blk.10.ffn_up_exps.weight,Q5_K
blk.10.ffn_up_shexp.weight,Q6_K
blk.11.attn_kv_a_mqa.weight,Q8_0
blk.11.attn_kv_a_norm.weight,F32
blk.11.attn_k_b.weight,Q8_0
blk.11.attn_v_b.weight,Q8_0
blk.11.attn_norm.weight,F32
blk.11.attn_output.weight,Q6_K
blk.11.attn_q_a.weight,Q6_K
blk.11.attn_q_a_norm.weight,F32
blk.11.attn_q_b.weight,Q6_K
blk.11.exp_probs_b.bias,F32
blk.11.ffn_down_exps.weight,Q6_K
blk.11.ffn_down_shexp.weight,Q8_0
blk.11.ffn_gate_exps.weight,Q5_K
blk.11.ffn_gate_inp.weight,F32
blk.11.ffn_gate_shexp.weight,Q6_K
blk.11.ffn_norm.weight,F32
blk.11.ffn_up_exps.weight,Q5_K
blk.11.ffn_up_shexp.weight,Q6_K
blk.12.attn_kv_a_mqa.weight,Q8_0
blk.12.attn_kv_a_norm.weight,F32
blk.12.attn_k_b.weight,Q8_0
blk.12.attn_v_b.weight,Q8_0
blk.12.attn_norm.weight,F32
blk.12.attn_output.weight,Q6_K
blk.12.attn_q_a.weight,Q6_K
blk.12.attn_q_a_norm.weight,F32
blk.12.attn_q_b.weight,Q6_K
blk.12.exp_probs_b.bias,F32
blk.12.ffn_down_exps.weight,Q6_K
blk.12.ffn_down_shexp.weight,Q8_0
blk.12.ffn_gate_exps.weight,Q5_K
blk.12.ffn_gate_inp.weight,F32
blk.12.ffn_gate_shexp.weight,Q6_K
blk.12.ffn_norm.weight,F32
blk.12.ffn_up_exps.weight,Q5_K
blk.12.ffn_up_shexp.weight,Q6_K
blk.13.attn_kv_a_mqa.weight,Q8_0
blk.13.attn_kv_a_norm.weight,F32
blk.13.attn_k_b.weight,Q8_0
blk.13.attn_v_b.weight,Q8_0
blk.13.attn_norm.weight,F32
blk.13.attn_output.weight,Q6_K
blk.13.attn_q_a.weight,Q6_K
blk.13.attn_q_a_norm.weight,F32
blk.13.attn_q_b.weight,Q6_K
blk.13.exp_probs_b.bias,F32
blk.13.ffn_down_exps.weight,Q6_K
blk.13.ffn_down_shexp.weight,Q8_0
blk.13.ffn_gate_exps.weight,Q5_K
blk.13.ffn_gate_inp.weight,F32
blk.13.ffn_gate_shexp.weight,Q6_K
blk.13.ffn_norm.weight,F32
blk.13.ffn_up_exps.weight,Q5_K
blk.13.ffn_up_shexp.weight,Q6_K
blk.14.attn_kv_a_mqa.weight,Q8_0
blk.14.attn_kv_a_norm.weight,F32
blk.14.attn_k_b.weight,Q8_0
blk.14.attn_v_b.weight,Q8_0
blk.14.attn_norm.weight,F32
blk.14.attn_output.weight,Q6_K
blk.14.attn_q_a.weight,Q6_K
blk.14.attn_q_a_norm.weight,F32
blk.14.attn_q_b.weight,Q6_K
blk.14.exp_probs_b.bias,F32
blk.14.ffn_down_exps.weight,Q6_K
blk.14.ffn_down_shexp.weight,Q8_0
blk.14.ffn_gate_exps.weight,Q5_K
blk.14.ffn_gate_inp.weight,F32
blk.14.ffn_gate_shexp.weight,Q6_K
blk.14.ffn_norm.weight,F32
blk.14.ffn_up_exps.weight,Q5_K
blk.14.ffn_up_shexp.weight,Q6_K
blk.15.attn_kv_a_mqa.weight,Q8_0
blk.15.attn_kv_a_norm.weight,F32
blk.15.attn_k_b.weight,Q8_0
blk.15.attn_v_b.weight,Q8_0
blk.15.attn_norm.weight,F32
blk.15.attn_output.weight,Q6_K
blk.15.attn_q_a.weight,Q6_K
blk.15.attn_q_a_norm.weight,F32
blk.15.attn_q_b.weight,Q6_K
blk.15.exp_probs_b.bias,F32
blk.15.ffn_down_exps.weight,Q6_K
blk.15.ffn_down_shexp.weight,Q8_0
blk.15.ffn_gate_exps.weight,Q5_K
blk.15.ffn_gate_inp.weight,F32
blk.15.ffn_gate_shexp.weight,Q6_K
blk.15.ffn_norm.weight,F32
blk.15.ffn_up_exps.weight,Q5_K
blk.15.ffn_up_shexp.weight,Q6_K
blk.16.attn_kv_a_mqa.weight,Q8_0
blk.16.attn_kv_a_norm.weight,F32
blk.16.attn_k_b.weight,Q8_0
blk.16.attn_v_b.weight,Q8_0
blk.16.attn_norm.weight,F32
blk.16.attn_output.weight,Q6_K
blk.16.attn_q_a.weight,Q6_K
blk.16.attn_q_a_norm.weight,F32
blk.16.attn_q_b.weight,Q6_K
blk.16.exp_probs_b.bias,F32
blk.16.ffn_down_exps.weight,Q6_K
blk.16.ffn_down_shexp.weight,Q8_0
blk.16.ffn_gate_exps.weight,Q5_K
blk.16.ffn_gate_inp.weight,F32
blk.16.ffn_gate_shexp.weight,Q6_K
blk.16.ffn_norm.weight,F32
blk.16.ffn_up_exps.weight,Q5_K
blk.16.ffn_up_shexp.weight,Q6_K
blk.17.attn_kv_a_mqa.weight,Q8_0
blk.17.attn_kv_a_norm.weight,F32
blk.17.attn_k_b.weight,Q8_0
blk.17.attn_v_b.weight,Q8_0
blk.17.attn_norm.weight,F32
blk.17.attn_output.weight,Q6_K
blk.17.attn_q_a.weight,Q6_K
blk.17.attn_q_a_norm.weight,F32
blk.17.attn_q_b.weight,Q6_K
blk.17.exp_probs_b.bias,F32
blk.17.ffn_down_exps.weight,Q6_K
blk.17.ffn_down_shexp.weight,Q8_0
blk.17.ffn_gate_exps.weight,Q5_K
blk.17.ffn_gate_inp.weight,F32
blk.17.ffn_gate_shexp.weight,Q6_K
blk.17.ffn_norm.weight,F32
blk.17.ffn_up_exps.weight,Q5_K
blk.17.ffn_up_shexp.weight,Q6_K
blk.18.attn_kv_a_mqa.weight,Q8_0
blk.18.attn_kv_a_norm.weight,F32
blk.18.attn_k_b.weight,Q8_0
blk.18.attn_v_b.weight,Q8_0
blk.18.attn_norm.weight,F32
blk.18.attn_output.weight,Q6_K
blk.18.attn_q_a.weight,Q6_K
blk.18.attn_q_a_norm.weight,F32
blk.18.attn_q_b.weight,Q6_K
blk.18.exp_probs_b.bias,F32
blk.18.ffn_down_exps.weight,Q6_K
blk.18.ffn_down_shexp.weight,Q8_0
blk.18.ffn_gate_exps.weight,Q5_K
blk.18.ffn_gate_inp.weight,F32
blk.18.ffn_gate_shexp.weight,Q6_K
blk.18.ffn_norm.weight,F32
blk.18.ffn_up_exps.weight,Q5_K
blk.18.ffn_up_shexp.weight,Q6_K
blk.19.attn_kv_a_mqa.weight,Q8_0
blk.19.attn_kv_a_norm.weight,F32
blk.19.attn_k_b.weight,Q8_0
blk.19.attn_v_b.weight,Q8_0
blk.19.attn_norm.weight,F32
blk.19.attn_output.weight,Q6_K
blk.19.attn_q_a.weight,Q6_K
blk.19.attn_q_a_norm.weight,F32
blk.19.attn_q_b.weight,Q6_K
blk.19.exp_probs_b.bias,F32
blk.19.ffn_down_exps.weight,Q6_K
blk.19.ffn_down_shexp.weight,Q8_0
blk.19.ffn_gate_exps.weight,Q5_K
blk.19.ffn_gate_inp.weight,F32
blk.19.ffn_gate_shexp.weight,Q6_K
blk.19.ffn_norm.weight,F32
blk.19.ffn_up_exps.weight,Q5_K
blk.19.ffn_up_shexp.weight,Q6_K
blk.20.attn_kv_a_mqa.weight,Q8_0
blk.20.attn_kv_a_norm.weight,F32
blk.20.attn_k_b.weight,Q8_0
blk.20.attn_v_b.weight,Q8_0
blk.20.attn_norm.weight,F32
blk.20.attn_output.weight,Q6_K
blk.20.attn_q_a.weight,Q6_K
blk.20.attn_q_a_norm.weight,F32
blk.20.attn_q_b.weight,Q6_K
blk.20.exp_probs_b.bias,F32
blk.20.ffn_down_exps.weight,Q6_K
blk.20.ffn_down_shexp.weight,Q8_0
blk.20.ffn_gate_exps.weight,Q5_K
blk.20.ffn_gate_inp.weight,F32
blk.20.ffn_gate_shexp.weight,Q6_K
blk.20.ffn_norm.weight,F32
blk.20.ffn_up_exps.weight,Q5_K
blk.20.ffn_up_shexp.weight,Q6_K
blk.21.attn_kv_a_mqa.weight,Q8_0
blk.21.attn_kv_a_norm.weight,F32
blk.21.attn_k_b.weight,Q8_0
blk.21.attn_v_b.weight,Q8_0
blk.21.attn_norm.weight,F32
blk.21.attn_output.weight,Q6_K
blk.21.attn_q_a.weight,Q6_K
blk.21.attn_q_a_norm.weight,F32
blk.21.attn_q_b.weight,Q6_K
blk.21.exp_probs_b.bias,F32
blk.21.ffn_down_exps.weight,Q6_K
blk.21.ffn_down_shexp.weight,Q8_0
blk.21.ffn_gate_exps.weight,Q5_K
blk.21.ffn_gate_inp.weight,F32
blk.21.ffn_gate_shexp.weight,Q6_K
blk.21.ffn_norm.weight,F32
blk.21.ffn_up_exps.weight,Q5_K
blk.21.ffn_up_shexp.weight,Q6_K
blk.22.attn_kv_a_mqa.weight,Q8_0
blk.22.attn_kv_a_norm.weight,F32
blk.22.attn_k_b.weight,Q8_0
blk.22.attn_v_b.weight,Q8_0
blk.22.attn_norm.weight,F32
blk.22.attn_output.weight,Q6_K
blk.22.attn_q_a.weight,Q6_K
blk.22.attn_q_a_norm.weight,F32
blk.22.attn_q_b.weight,Q6_K
blk.22.exp_probs_b.bias,F32
blk.22.ffn_down_exps.weight,Q6_K
blk.22.ffn_down_shexp.weight,Q8_0
blk.22.ffn_gate_exps.weight,Q5_K
blk.22.ffn_gate_inp.weight,F32
blk.22.ffn_gate_shexp.weight,Q6_K
blk.22.ffn_norm.weight,F32
blk.22.ffn_up_exps.weight,Q5_K
blk.22.ffn_up_shexp.weight,Q6_K
blk.23.attn_kv_a_mqa.weight,Q8_0
blk.23.attn_kv_a_norm.weight,F32
blk.23.attn_k_b.weight,Q8_0
blk.23.attn_v_b.weight,Q8_0
blk.23.attn_norm.weight,F32
blk.23.attn_output.weight,Q6_K
blk.23.attn_q_a.weight,Q6_K
blk.23.attn_q_a_norm.weight,F32
blk.23.attn_q_b.weight,Q6_K
blk.23.exp_probs_b.bias,F32
blk.23.ffn_down_exps.weight,Q6_K
blk.23.ffn_down_shexp.weight,Q8_0
blk.23.ffn_gate_exps.weight,Q5_K
blk.23.ffn_gate_inp.weight,F32
blk.23.ffn_gate_shexp.weight,Q6_K
blk.23.ffn_norm.weight,F32
blk.23.ffn_up_exps.weight,Q5_K
blk.23.ffn_up_shexp.weight,Q6_K
blk.24.attn_kv_a_mqa.weight,Q8_0
blk.24.attn_kv_a_norm.weight,F32
blk.24.attn_k_b.weight,Q8_0
blk.24.attn_v_b.weight,Q8_0
blk.24.attn_norm.weight,F32
blk.24.attn_output.weight,Q6_K
blk.24.attn_q_a.weight,Q6_K
blk.24.attn_q_a_norm.weight,F32
blk.24.attn_q_b.weight,Q6_K
blk.24.exp_probs_b.bias,F32
blk.24.ffn_down_exps.weight,Q6_K
blk.24.ffn_down_shexp.weight,Q8_0
blk.24.ffn_gate_exps.weight,Q5_K
blk.24.ffn_gate_inp.weight,F32
blk.24.ffn_gate_shexp.weight,Q6_K
blk.24.ffn_norm.weight,F32
blk.24.ffn_up_exps.weight,Q5_K
blk.24.ffn_up_shexp.weight,Q6_K
blk.25.attn_kv_a_mqa.weight,Q8_0
blk.25.attn_kv_a_norm.weight,F32
blk.25.attn_k_b.weight,Q8_0
blk.25.attn_v_b.weight,Q8_0
blk.25.attn_norm.weight,F32
blk.25.attn_output.weight,Q6_K
blk.25.attn_q_a.weight,Q6_K
blk.25.attn_q_a_norm.weight,F32
blk.25.attn_q_b.weight,Q6_K
blk.25.exp_probs_b.bias,F32
blk.25.ffn_down_exps.weight,Q6_K
blk.25.ffn_down_shexp.weight,Q8_0
blk.25.ffn_gate_exps.weight,Q5_K
blk.25.ffn_gate_inp.weight,F32
blk.25.ffn_gate_shexp.weight,Q6_K
blk.25.ffn_norm.weight,F32
blk.25.ffn_up_exps.weight,Q5_K
blk.25.ffn_up_shexp.weight,Q6_K
blk.26.attn_kv_a_mqa.weight,Q8_0
blk.26.attn_kv_a_norm.weight,F32
blk.26.attn_k_b.weight,Q8_0
blk.26.attn_v_b.weight,Q8_0
blk.26.attn_norm.weight,F32
blk.26.attn_output.weight,Q6_K
blk.26.attn_q_a.weight,Q6_K
blk.26.attn_q_a_norm.weight,F32
blk.26.attn_q_b.weight,Q6_K
blk.26.exp_probs_b.bias,F32
blk.26.ffn_down_exps.weight,Q6_K
blk.26.ffn_down_shexp.weight,Q8_0
blk.26.ffn_gate_exps.weight,Q5_K
blk.26.ffn_gate_inp.weight,F32
blk.26.ffn_gate_shexp.weight,Q6_K
blk.26.ffn_norm.weight,F32
blk.26.ffn_up_exps.weight,Q5_K
blk.26.ffn_up_shexp.weight,Q6_K
blk.27.attn_kv_a_mqa.weight,Q8_0
blk.27.attn_kv_a_norm.weight,F32
blk.27.attn_k_b.weight,Q8_0
blk.27.attn_v_b.weight,Q8_0
blk.27.attn_norm.weight,F32
blk.27.attn_output.weight,Q6_K
blk.27.attn_q_a.weight,Q6_K
blk.27.attn_q_a_norm.weight,F32
blk.27.attn_q_b.weight,Q6_K
blk.27.exp_probs_b.bias,F32
blk.27.ffn_down_exps.weight,Q6_K
blk.27.ffn_down_shexp.weight,Q8_0
blk.27.ffn_gate_exps.weight,Q5_K
blk.27.ffn_gate_inp.weight,F32
blk.27.ffn_gate_shexp.weight,Q6_K
blk.27.ffn_norm.weight,F32
blk.27.ffn_up_exps.weight,Q5_K
blk.27.ffn_up_shexp.weight,Q6_K
blk.28.attn_kv_a_mqa.weight,Q8_0
blk.28.attn_kv_a_norm.weight,F32
blk.28.attn_k_b.weight,Q8_0
blk.28.attn_v_b.weight,Q8_0
blk.28.attn_norm.weight,F32
blk.28.attn_output.weight,Q6_K
blk.28.attn_q_a.weight,Q6_K
blk.28.attn_q_a_norm.weight,F32
blk.28.attn_q_b.weight,Q6_K
blk.28.exp_probs_b.bias,F32
blk.28.ffn_down_exps.weight,Q6_K
blk.28.ffn_down_shexp.weight,Q8_0
blk.28.ffn_gate_exps.weight,Q5_K
blk.28.ffn_gate_inp.weight,F32
blk.28.ffn_gate_shexp.weight,Q6_K
blk.28.ffn_norm.weight,F32
blk.28.ffn_up_exps.weight,Q5_K
blk.28.ffn_up_shexp.weight,Q6_K
blk.29.attn_kv_a_mqa.weight,Q8_0
blk.29.attn_kv_a_norm.weight,F32
blk.29.attn_k_b.weight,Q8_0
blk.29.attn_v_b.weight,Q8_0
blk.29.attn_norm.weight,F32
blk.29.attn_output.weight,Q6_K
blk.29.attn_q_a.weight,Q6_K
blk.29.attn_q_a_norm.weight,F32
blk.29.attn_q_b.weight,Q6_K
blk.29.exp_probs_b.bias,F32
blk.29.ffn_down_exps.weight,Q6_K
blk.29.ffn_down_shexp.weight,Q8_0
blk.29.ffn_gate_exps.weight,Q5_K
blk.29.ffn_gate_inp.weight,F32
blk.29.ffn_gate_shexp.weight,Q6_K
blk.29.ffn_norm.weight,F32
blk.29.ffn_up_exps.weight,Q5_K
blk.29.ffn_up_shexp.weight,Q6_K
blk.30.attn_kv_a_mqa.weight,Q8_0
blk.30.attn_kv_a_norm.weight,F32
blk.30.attn_k_b.weight,Q8_0
blk.30.attn_v_b.weight,Q8_0
blk.30.attn_norm.weight,F32
blk.30.attn_output.weight,Q6_K
blk.30.attn_q_a.weight,Q6_K
blk.30.attn_q_a_norm.weight,F32
blk.30.attn_q_b.weight,Q6_K
blk.30.exp_probs_b.bias,F32
blk.30.ffn_down_exps.weight,Q6_K
blk.30.ffn_down_shexp.weight,Q8_0
blk.30.ffn_gate_exps.weight,Q5_K
blk.30.ffn_gate_inp.weight,F32
blk.30.ffn_gate_shexp.weight,Q6_K
blk.30.ffn_norm.weight,F32
blk.30.ffn_up_exps.weight,Q5_K
blk.30.ffn_up_shexp.weight,Q6_K
blk.31.attn_kv_a_mqa.weight,Q8_0
blk.31.attn_kv_a_norm.weight,F32
blk.31.attn_k_b.weight,Q8_0
blk.31.attn_v_b.weight,Q8_0
blk.31.attn_norm.weight,F32
blk.31.attn_output.weight,Q6_K
blk.31.attn_q_a.weight,Q6_K
blk.31.attn_q_a_norm.weight,F32
blk.31.attn_q_b.weight,Q6_K
blk.31.exp_probs_b.bias,F32
blk.31.ffn_down_exps.weight,Q6_K
blk.31.ffn_down_shexp.weight,Q8_0
blk.31.ffn_gate_exps.weight,Q5_K
blk.31.ffn_gate_inp.weight,F32
blk.31.ffn_gate_shexp.weight,Q6_K
blk.31.ffn_norm.weight,F32
blk.31.ffn_up_exps.weight,Q5_K
blk.31.ffn_up_shexp.weight,Q6_K
blk.32.attn_kv_a_mqa.weight,Q8_0
blk.32.attn_kv_a_norm.weight,F32
blk.32.attn_k_b.weight,Q8_0
blk.32.attn_v_b.weight,Q8_0
blk.32.attn_norm.weight,F32
blk.32.attn_output.weight,Q6_K
blk.32.attn_q_a.weight,Q6_K
blk.32.attn_q_a_norm.weight,F32
blk.32.attn_q_b.weight,Q6_K
blk.32.exp_probs_b.bias,F32
blk.32.ffn_down_exps.weight,Q6_K
blk.32.ffn_down_shexp.weight,Q8_0
blk.32.ffn_gate_exps.weight,Q5_K
blk.32.ffn_gate_inp.weight,F32
blk.32.ffn_gate_shexp.weight,Q6_K
blk.32.ffn_norm.weight,F32
blk.32.ffn_up_exps.weight,Q5_K
blk.32.ffn_up_shexp.weight,Q6_K
blk.33.attn_kv_a_mqa.weight,Q8_0
blk.33.attn_kv_a_norm.weight,F32
blk.33.attn_k_b.weight,Q8_0
blk.33.attn_v_b.weight,Q8_0
blk.33.attn_norm.weight,F32
blk.33.attn_output.weight,Q6_K
blk.33.attn_q_a.weight,Q6_K
blk.33.attn_q_a_norm.weight,F32
blk.33.attn_q_b.weight,Q6_K
blk.33.exp_probs_b.bias,F32
blk.33.ffn_down_exps.weight,Q6_K
blk.33.ffn_down_shexp.weight,Q8_0
blk.33.ffn_gate_exps.weight,Q5_K
blk.33.ffn_gate_inp.weight,F32
blk.33.ffn_gate_shexp.weight,Q6_K
blk.33.ffn_norm.weight,F32
blk.33.ffn_up_exps.weight,Q5_K
blk.33.ffn_up_shexp.weight,Q6_K
blk.34.attn_kv_a_mqa.weight,Q8_0
blk.34.attn_kv_a_norm.weight,F32
blk.34.attn_k_b.weight,Q8_0
blk.34.attn_v_b.weight,Q8_0
blk.34.attn_norm.weight,F32
blk.34.attn_output.weight,Q6_K
blk.34.attn_q_a.weight,Q6_K
blk.34.attn_q_a_norm.weight,F32
blk.34.attn_q_b.weight,Q6_K
blk.34.exp_probs_b.bias,F32
blk.34.ffn_down_exps.weight,Q6_K
blk.34.ffn_down_shexp.weight,Q8_0
blk.34.ffn_gate_exps.weight,Q5_K
blk.34.ffn_gate_inp.weight,F32
blk.34.ffn_gate_shexp.weight,Q6_K
blk.34.ffn_norm.weight,F32
blk.34.ffn_up_exps.weight,Q5_K
blk.34.ffn_up_shexp.weight,Q6_K
blk.35.attn_kv_a_mqa.weight,Q8_0
blk.35.attn_kv_a_norm.weight,F32
blk.35.attn_k_b.weight,Q8_0
blk.35.attn_v_b.weight,Q8_0
blk.35.attn_norm.weight,F32
blk.35.attn_output.weight,Q6_K
blk.35.attn_q_a.weight,Q6_K
blk.35.attn_q_a_norm.weight,F32
blk.35.attn_q_b.weight,Q6_K
blk.35.exp_probs_b.bias,F32
blk.35.ffn_down_exps.weight,Q6_K
blk.35.ffn_down_shexp.weight,Q8_0
blk.35.ffn_gate_exps.weight,Q5_K
blk.35.ffn_gate_inp.weight,F32
blk.35.ffn_gate_shexp.weight,Q6_K
blk.35.ffn_norm.weight,F32
blk.35.ffn_up_exps.weight,Q5_K
blk.35.ffn_up_shexp.weight,Q6_K
blk.36.attn_kv_a_mqa.weight,Q8_0
blk.36.attn_kv_a_norm.weight,F32
blk.36.attn_k_b.weight,Q8_0
blk.36.attn_v_b.weight,Q8_0
blk.36.attn_norm.weight,F32
blk.36.attn_output.weight,Q6_K
blk.36.attn_q_a.weight,Q6_K
blk.36.attn_q_a_norm.weight,F32
blk.36.attn_q_b.weight,Q6_K
blk.36.exp_probs_b.bias,F32
blk.36.ffn_down_exps.weight,Q6_K
blk.36.ffn_down_shexp.weight,Q8_0
blk.36.ffn_gate_exps.weight,Q5_K
blk.36.ffn_gate_inp.weight,F32
blk.36.ffn_gate_shexp.weight,Q6_K
blk.36.ffn_norm.weight,F32
blk.36.ffn_up_exps.weight,Q5_K
blk.36.ffn_up_shexp.weight,Q6_K
blk.37.attn_kv_a_mqa.weight,Q8_0
blk.37.attn_kv_a_norm.weight,F32
blk.37.attn_k_b.weight,Q8_0
blk.37.attn_v_b.weight,Q8_0
blk.37.attn_norm.weight,F32
blk.37.attn_output.weight,Q6_K
blk.37.attn_q_a.weight,Q6_K
blk.37.attn_q_a_norm.weight,F32
blk.37.attn_q_b.weight,Q6_K
blk.37.exp_probs_b.bias,F32
blk.37.ffn_down_exps.weight,Q6_K
blk.37.ffn_down_shexp.weight,Q8_0
blk.37.ffn_gate_exps.weight,Q5_K
blk.37.ffn_gate_inp.weight,F32
blk.37.ffn_gate_shexp.weight,Q6_K
blk.37.ffn_norm.weight,F32
blk.37.ffn_up_exps.weight,Q5_K
blk.37.ffn_up_shexp.weight,Q6_K
blk.38.attn_kv_a_mqa.weight,Q8_0
blk.38.attn_kv_a_norm.weight,F32
blk.38.attn_k_b.weight,Q8_0
blk.38.attn_v_b.weight,Q8_0
blk.38.attn_norm.weight,F32
blk.38.attn_output.weight,Q6_K
blk.38.attn_q_a.weight,Q6_K
blk.38.attn_q_a_norm.weight,F32
blk.38.attn_q_b.weight,Q6_K
blk.38.exp_probs_b.bias,F32
blk.38.ffn_down_exps.weight,Q6_K
blk.38.ffn_down_shexp.weight,Q8_0
blk.38.ffn_gate_exps.weight,Q5_K
blk.38.ffn_gate_inp.weight,F32
blk.38.ffn_gate_shexp.weight,Q6_K
blk.38.ffn_norm.weight,F32
blk.38.ffn_up_exps.weight,Q5_K
blk.38.ffn_up_shexp.weight,Q6_K
blk.39.attn_kv_a_mqa.weight,Q8_0
blk.39.attn_kv_a_norm.weight,F32
blk.39.attn_k_b.weight,Q8_0
blk.39.attn_v_b.weight,Q8_0
blk.39.attn_norm.weight,F32
blk.39.attn_output.weight,Q6_K
blk.39.attn_q_a.weight,Q6_K
blk.39.attn_q_a_norm.weight,F32
blk.39.attn_q_b.weight,Q6_K
blk.39.exp_probs_b.bias,F32
blk.39.ffn_down_exps.weight,Q6_K
blk.39.ffn_down_shexp.weight,Q8_0
blk.39.ffn_gate_exps.weight,Q5_K
blk.39.ffn_gate_inp.weight,F32
blk.39.ffn_gate_shexp.weight,Q6_K
blk.39.ffn_norm.weight,F32
blk.39.ffn_up_exps.weight,Q5_K
blk.39.ffn_up_shexp.weight,Q6_K
blk.40.attn_kv_a_mqa.weight,Q8_0
blk.40.attn_kv_a_norm.weight,F32
blk.40.attn_k_b.weight,Q8_0
blk.40.attn_v_b.weight,Q8_0
blk.40.attn_norm.weight,F32
blk.40.attn_output.weight,Q6_K
blk.40.attn_q_a.weight,Q6_K
blk.40.attn_q_a_norm.weight,F32
blk.40.attn_q_b.weight,Q6_K
blk.40.exp_probs_b.bias,F32
blk.40.ffn_down_exps.weight,Q6_K
blk.40.ffn_down_shexp.weight,Q8_0
blk.40.ffn_gate_exps.weight,Q5_K
blk.40.ffn_gate_inp.weight,F32
blk.40.ffn_gate_shexp.weight,Q6_K
blk.40.ffn_norm.weight,F32
blk.40.ffn_up_exps.weight,Q5_K
blk.40.ffn_up_shexp.weight,Q6_K
blk.41.attn_kv_a_mqa.weight,Q8_0
blk.41.attn_kv_a_norm.weight,F32
blk.41.attn_k_b.weight,Q8_0
blk.41.attn_v_b.weight,Q8_0
blk.41.attn_norm.weight,F32
blk.41.attn_output.weight,Q6_K
blk.41.attn_q_a.weight,Q6_K
blk.41.attn_q_a_norm.weight,F32
blk.41.attn_q_b.weight,Q6_K
blk.41.exp_probs_b.bias,F32
blk.41.ffn_down_exps.weight,Q6_K
blk.41.ffn_down_shexp.weight,Q8_0
blk.41.ffn_gate_exps.weight,Q5_K
blk.41.ffn_gate_inp.weight,F32
blk.41.ffn_gate_shexp.weight,Q6_K
blk.41.ffn_norm.weight,F32
blk.41.ffn_up_exps.weight,Q5_K
blk.41.ffn_up_shexp.weight,Q6_K
blk.42.attn_kv_a_mqa.weight,Q8_0
blk.42.attn_kv_a_norm.weight,F32
blk.42.attn_k_b.weight,Q8_0
blk.42.attn_v_b.weight,Q8_0
blk.42.attn_norm.weight,F32
blk.42.attn_output.weight,Q6_K
blk.42.attn_q_a.weight,Q6_K
blk.42.attn_q_a_norm.weight,F32
blk.42.attn_q_b.weight,Q6_K
blk.42.exp_probs_b.bias,F32
blk.42.ffn_down_exps.weight,Q6_K
blk.42.ffn_down_shexp.weight,Q8_0
blk.42.ffn_gate_exps.weight,Q5_K
blk.42.ffn_gate_inp.weight,F32
blk.42.ffn_gate_shexp.weight,Q6_K
blk.42.ffn_norm.weight,F32
blk.42.ffn_up_exps.weight,Q5_K
blk.42.ffn_up_shexp.weight,Q6_K
blk.43.attn_kv_a_mqa.weight,Q8_0
blk.43.attn_kv_a_norm.weight,F32
blk.43.attn_k_b.weight,Q8_0
blk.43.attn_v_b.weight,Q8_0
blk.43.attn_norm.weight,F32
blk.43.attn_output.weight,Q6_K
blk.43.attn_q_a.weight,Q6_K
blk.43.attn_q_a_norm.weight,F32
blk.43.attn_q_b.weight,Q6_K
blk.43.exp_probs_b.bias,F32
blk.43.ffn_down_exps.weight,Q6_K
blk.43.ffn_down_shexp.weight,Q8_0
blk.43.ffn_gate_exps.weight,Q5_K
blk.43.ffn_gate_inp.weight,F32
blk.43.ffn_gate_shexp.weight,Q6_K
blk.43.ffn_norm.weight,F32
blk.43.ffn_up_exps.weight,Q5_K
blk.43.ffn_up_shexp.weight,Q6_K
blk.44.attn_kv_a_mqa.weight,Q8_0
blk.44.attn_kv_a_norm.weight,F32
blk.44.attn_k_b.weight,Q8_0
blk.44.attn_v_b.weight,Q8_0
blk.44.attn_norm.weight,F32
blk.44.attn_output.weight,Q6_K
blk.44.attn_q_a.weight,Q6_K
blk.44.attn_q_a_norm.weight,F32
blk.44.attn_q_b.weight,Q6_K
blk.44.exp_probs_b.bias,F32
blk.44.ffn_down_exps.weight,Q6_K
blk.44.ffn_down_shexp.weight,Q8_0
blk.44.ffn_gate_exps.weight,Q5_K
blk.44.ffn_gate_inp.weight,F32
blk.44.ffn_gate_shexp.weight,Q6_K
blk.44.ffn_norm.weight,F32
blk.44.ffn_up_exps.weight,Q5_K
blk.44.ffn_up_shexp.weight,Q6_K
blk.45.attn_kv_a_mqa.weight,Q8_0
blk.45.attn_kv_a_norm.weight,F32
blk.45.attn_k_b.weight,Q8_0
blk.45.attn_v_b.weight,Q8_0
blk.45.attn_norm.weight,F32
blk.45.attn_output.weight,Q6_K
blk.45.attn_q_a.weight,Q6_K
blk.45.attn_q_a_norm.weight,F32
blk.45.attn_q_b.weight,Q6_K
blk.45.exp_probs_b.bias,F32
blk.45.ffn_down_exps.weight,Q6_K
blk.45.ffn_down_shexp.weight,Q8_0
blk.45.ffn_gate_exps.weight,Q5_K
blk.45.ffn_gate_inp.weight,F32
blk.45.ffn_gate_shexp.weight,Q6_K
blk.45.ffn_norm.weight,F32
blk.45.ffn_up_exps.weight,Q5_K
blk.45.ffn_up_shexp.weight,Q6_K
blk.46.attn_kv_a_mqa.weight,Q8_0
blk.46.attn_kv_a_norm.weight,F32
blk.46.attn_k_b.weight,Q8_0
blk.46.attn_v_b.weight,Q8_0
blk.46.attn_norm.weight,F32
blk.46.attn_output.weight,Q6_K
blk.46.attn_q_a.weight,Q6_K
blk.46.attn_q_a_norm.weight,F32
blk.46.attn_q_b.weight,Q6_K
blk.46.exp_probs_b.bias,F32
blk.46.ffn_down_exps.weight,Q6_K
blk.46.ffn_down_shexp.weight,Q8_0
blk.46.ffn_gate_exps.weight,Q5_K
blk.46.ffn_gate_inp.weight,F32
blk.46.ffn_gate_shexp.weight,Q6_K
blk.46.ffn_norm.weight,F32
blk.46.ffn_up_exps.weight,Q5_K
blk.46.ffn_up_shexp.weight,Q6_K
blk.47.attn_kv_a_mqa.weight,Q8_0
blk.47.attn_kv_a_norm.weight,F32
blk.47.attn_k_b.weight,Q8_0
blk.47.attn_v_b.weight,Q8_0
blk.47.attn_norm.weight,F32
blk.47.attn_output.weight,Q6_K
blk.47.attn_q_a.weight,Q6_K
blk.47.attn_q_a_norm.weight,F32
blk.47.attn_q_b.weight,Q6_K
blk.47.exp_probs_b.bias,F32
blk.47.ffn_down_exps.weight,Q6_K
blk.47.ffn_down_shexp.weight,Q8_0
blk.47.ffn_gate_exps.weight,Q5_K
blk.47.ffn_gate_inp.weight,F32
blk.47.ffn_gate_shexp.weight,Q6_K
blk.47.ffn_norm.weight,F32
blk.47.ffn_up_exps.weight,Q5_K
blk.47.ffn_up_shexp.weight,Q6_K
blk.48.attn_kv_a_mqa.weight,Q8_0
blk.48.attn_kv_a_norm.weight,F32
blk.48.attn_k_b.weight,Q8_0
blk.48.attn_v_b.weight,Q8_0
blk.48.attn_norm.weight,F32
blk.48.attn_output.weight,Q6_K
blk.48.attn_q_a.weight,Q6_K
blk.48.attn_q_a_norm.weight,F32
blk.48.attn_q_b.weight,Q6_K
blk.48.exp_probs_b.bias,F32
blk.48.ffn_down_exps.weight,Q6_K
blk.48.ffn_down_shexp.weight,Q8_0
blk.48.ffn_gate_exps.weight,Q5_K
blk.48.ffn_gate_inp.weight,F32
blk.48.ffn_gate_shexp.weight,Q6_K
blk.48.ffn_norm.weight,F32
blk.48.ffn_up_exps.weight,Q5_K
blk.48.ffn_up_shexp.weight,Q6_K
blk.49.attn_kv_a_mqa.weight,Q8_0
blk.49.attn_kv_a_norm.weight,F32
blk.49.attn_k_b.weight,Q8_0
blk.49.attn_v_b.weight,Q8_0
blk.49.attn_norm.weight,F32
blk.49.attn_output.weight,Q6_K
blk.49.attn_q_a.weight,Q6_K
blk.49.attn_q_a_norm.weight,F32
blk.49.attn_q_b.weight,Q6_K
blk.49.exp_probs_b.bias,F32
blk.49.ffn_down_exps.weight,Q6_K
blk.49.ffn_down_shexp.weight,Q8_0
blk.49.ffn_gate_exps.weight,Q5_K
blk.49.ffn_gate_inp.weight,F32
blk.49.ffn_gate_shexp.weight,Q6_K
blk.49.ffn_norm.weight,F32
blk.49.ffn_up_exps.weight,Q5_K
blk.49.ffn_up_shexp.weight,Q6_K
blk.50.attn_kv_a_mqa.weight,Q8_0
blk.50.attn_kv_a_norm.weight,F32
blk.50.attn_k_b.weight,Q8_0
blk.50.attn_v_b.weight,Q8_0
blk.50.attn_norm.weight,F32
blk.50.attn_output.weight,Q6_K
blk.50.attn_q_a.weight,Q6_K
blk.50.attn_q_a_norm.weight,F32
blk.50.attn_q_b.weight,Q6_K
blk.50.exp_probs_b.bias,F32
blk.50.ffn_down_exps.weight,Q6_K
blk.50.ffn_down_shexp.weight,Q8_0
blk.50.ffn_gate_exps.weight,Q5_K
blk.50.ffn_gate_inp.weight,F32
blk.50.ffn_gate_shexp.weight,Q6_K
blk.50.ffn_norm.weight,F32
blk.50.ffn_up_exps.weight,Q5_K
blk.50.ffn_up_shexp.weight,Q6_K
blk.51.attn_kv_a_mqa.weight,Q8_0
blk.51.attn_kv_a_norm.weight,F32
blk.51.attn_k_b.weight,Q8_0
blk.51.attn_v_b.weight,Q8_0
blk.51.attn_norm.weight,F32
blk.51.attn_output.weight,Q6_K
blk.51.attn_q_a.weight,Q6_K
blk.51.attn_q_a_norm.weight,F32
blk.51.attn_q_b.weight,Q6_K
blk.51.exp_probs_b.bias,F32
blk.51.ffn_down_exps.weight,Q6_K
blk.51.ffn_down_shexp.weight,Q8_0
blk.51.ffn_gate_exps.weight,Q5_K
blk.51.ffn_gate_inp.weight,F32
blk.51.ffn_gate_shexp.weight,Q6_K
blk.51.ffn_norm.weight,F32
blk.51.ffn_up_exps.weight,Q5_K
blk.51.ffn_up_shexp.weight,Q6_K
blk.52.attn_kv_a_mqa.weight,Q8_0
blk.52.attn_kv_a_norm.weight,F32
blk.52.attn_k_b.weight,Q8_0
blk.52.attn_v_b.weight,Q8_0
blk.52.attn_norm.weight,F32
blk.52.attn_output.weight,Q6_K
blk.52.attn_q_a.weight,Q6_K
blk.52.attn_q_a_norm.weight,F32
blk.52.attn_q_b.weight,Q6_K
blk.52.exp_probs_b.bias,F32
blk.52.ffn_down_exps.weight,Q6_K
blk.52.ffn_down_shexp.weight,Q8_0
blk.52.ffn_gate_exps.weight,Q5_K
blk.52.ffn_gate_inp.weight,F32
blk.52.ffn_gate_shexp.weight,Q6_K
blk.52.ffn_norm.weight,F32
blk.52.ffn_up_exps.weight,Q5_K
blk.52.ffn_up_shexp.weight,Q6_K
blk.53.attn_kv_a_mqa.weight,Q8_0
blk.53.attn_kv_a_norm.weight,F32
blk.53.attn_k_b.weight,Q8_0
blk.53.attn_v_b.weight,Q8_0
blk.53.attn_norm.weight,F32
blk.53.attn_output.weight,Q6_K
blk.53.attn_q_a.weight,Q6_K
blk.53.attn_q_a_norm.weight,F32
blk.53.attn_q_b.weight,Q6_K
blk.53.exp_probs_b.bias,F32
blk.53.ffn_down_exps.weight,Q6_K
blk.53.ffn_down_shexp.weight,Q8_0
blk.53.ffn_gate_exps.weight,Q5_K
blk.53.ffn_gate_inp.weight,F32
blk.53.ffn_gate_shexp.weight,Q6_K
blk.53.ffn_norm.weight,F32
blk.53.ffn_up_exps.weight,Q5_K
blk.53.ffn_up_shexp.weight,Q6_K
blk.54.attn_kv_a_mqa.weight,Q8_0
blk.54.attn_kv_a_norm.weight,F32
blk.54.attn_k_b.weight,Q8_0
blk.54.attn_v_b.weight,Q8_0
blk.54.attn_norm.weight,F32
blk.54.attn_output.weight,Q6_K
blk.54.attn_q_a.weight,Q6_K
blk.54.attn_q_a_norm.weight,F32
blk.54.attn_q_b.weight,Q6_K
blk.54.exp_probs_b.bias,F32
blk.54.ffn_down_exps.weight,Q6_K
blk.54.ffn_down_shexp.weight,Q8_0
blk.54.ffn_gate_exps.weight,Q5_K
blk.54.ffn_gate_inp.weight,F32
blk.54.ffn_gate_shexp.weight,Q6_K
blk.54.ffn_norm.weight,F32
blk.54.ffn_up_exps.weight,Q5_K
blk.54.ffn_up_shexp.weight,Q6_K
blk.55.attn_kv_a_mqa.weight,Q8_0
blk.55.attn_kv_a_norm.weight,F32
blk.55.attn_k_b.weight,Q8_0
blk.55.attn_v_b.weight,Q8_0
blk.55.attn_norm.weight,F32
blk.55.attn_output.weight,Q6_K
blk.55.attn_q_a.weight,Q6_K
blk.55.attn_q_a_norm.weight,F32
blk.55.attn_q_b.weight,Q6_K
blk.55.exp_probs_b.bias,F32
blk.55.ffn_down_exps.weight,Q6_K
blk.55.ffn_down_shexp.weight,Q8_0
blk.55.ffn_gate_exps.weight,Q5_K
blk.55.ffn_gate_inp.weight,F32
blk.55.ffn_gate_shexp.weight,Q6_K
blk.55.ffn_norm.weight,F32
blk.55.ffn_up_exps.weight,Q5_K
blk.55.ffn_up_shexp.weight,Q6_K
blk.56.attn_kv_a_mqa.weight,Q8_0
blk.56.attn_kv_a_norm.weight,F32
blk.56.attn_k_b.weight,Q8_0
blk.56.attn_v_b.weight,Q8_0
blk.56.attn_norm.weight,F32
blk.56.attn_output.weight,Q6_K
blk.56.attn_q_a.weight,Q6_K
blk.56.attn_q_a_norm.weight,F32
blk.56.attn_q_b.weight,Q6_K
blk.56.exp_probs_b.bias,F32
blk.56.ffn_down_exps.weight,Q6_K
blk.56.ffn_down_shexp.weight,Q8_0
blk.56.ffn_gate_exps.weight,Q5_K
blk.56.ffn_gate_inp.weight,F32
blk.56.ffn_gate_shexp.weight,Q6_K
blk.56.ffn_norm.weight,F32
blk.56.ffn_up_exps.weight,Q5_K
blk.56.ffn_up_shexp.weight,Q6_K
blk.57.attn_kv_a_mqa.weight,Q8_0
blk.57.attn_kv_a_norm.weight,F32
blk.57.attn_k_b.weight,Q8_0
blk.57.attn_v_b.weight,Q8_0
blk.57.attn_norm.weight,F32
blk.57.attn_output.weight,Q6_K
blk.57.attn_q_a.weight,Q6_K
blk.57.attn_q_a_norm.weight,F32
blk.57.attn_q_b.weight,Q6_K
blk.57.exp_probs_b.bias,F32
blk.57.ffn_down_exps.weight,Q6_K
blk.57.ffn_down_shexp.weight,Q8_0
blk.57.ffn_gate_exps.weight,Q5_K
blk.57.ffn_gate_inp.weight,F32
blk.57.ffn_gate_shexp.weight,Q6_K
blk.57.ffn_norm.weight,F32
blk.57.ffn_up_exps.weight,Q5_K
blk.57.ffn_up_shexp.weight,Q6_K
blk.58.attn_kv_a_mqa.weight,Q8_0
blk.58.attn_kv_a_norm.weight,F32
blk.58.attn_k_b.weight,Q8_0
blk.58.attn_v_b.weight,Q8_0
blk.58.attn_norm.weight,F32
blk.58.attn_output.weight,Q6_K
blk.58.attn_q_a.weight,Q6_K
blk.58.attn_q_a_norm.weight,F32
blk.58.attn_q_b.weight,Q6_K
blk.58.exp_probs_b.bias,F32
blk.58.ffn_down_exps.weight,Q6_K
blk.58.ffn_down_shexp.weight,Q8_0
blk.58.ffn_gate_exps.weight,Q5_K
blk.58.ffn_gate_inp.weight,F32
blk.58.ffn_gate_shexp.weight,Q6_K
blk.58.ffn_norm.weight,F32
blk.58.ffn_up_exps.weight,Q5_K
blk.58.ffn_up_shexp.weight,Q6_K
blk.59.attn_kv_a_mqa.weight,Q8_0
blk.59.attn_kv_a_norm.weight,F32
blk.59.attn_k_b.weight,Q8_0
blk.59.attn_v_b.weight,Q8_0
blk.59.attn_norm.weight,F32
blk.59.attn_output.weight,Q6_K
blk.59.attn_q_a.weight,Q6_K
blk.59.attn_q_a_norm.weight,F32
blk.59.attn_q_b.weight,Q6_K
blk.59.exp_probs_b.bias,F32
blk.59.ffn_down_exps.weight,Q6_K
blk.59.ffn_down_shexp.weight,Q8_0
blk.59.ffn_gate_exps.weight,Q5_K
blk.59.ffn_gate_inp.weight,F32
blk.59.ffn_gate_shexp.weight,Q6_K
blk.59.ffn_norm.weight,F32
blk.59.ffn_up_exps.weight,Q5_K
blk.59.ffn_up_shexp.weight,Q6_K
blk.60.attn_kv_a_mqa.weight,Q8_0
blk.60.attn_kv_a_norm.weight,F32
blk.60.attn_k_b.weight,Q8_0
blk.60.attn_v_b.weight,Q8_0
blk.60.attn_norm.weight,F32
blk.60.attn_output.weight,Q6_K
blk.60.attn_q_a.weight,Q6_K
blk.60.attn_q_a_norm.weight,F32
blk.60.attn_q_b.weight,Q6_K
blk.60.exp_probs_b.bias,F32
blk.60.ffn_down_exps.weight,Q6_K
blk.60.ffn_down_shexp.weight,Q8_0
blk.60.ffn_gate_exps.weight,Q5_K
blk.60.ffn_gate_inp.weight,F32
blk.60.ffn_gate_shexp.weight,Q6_K
blk.60.ffn_norm.weight,F32
blk.60.ffn_up_exps.weight,Q5_K
blk.60.ffn_up_shexp.weight,Q6_K

@bibproj
You can see these tensor parameters easily by gguf_editor_gui.py from llama.cpp.
I share my code here. You can place the path of the two inputs (for example different quants) and the file name of the output at the beginning of the code. Because of the past the name of them V3, and R1. In the KEY_MAPPING part you can define which tensors from which model will be included in the new one. For example "blk.0" means all the tensors in the block 0, "shexp" means all the shared experts from all the blocks. The not defined tensors will come from the V3.

import gguf
import os
import time
import sys
from tqdm import tqdm
import numpy as np
import gc

Paths to model files

PATH_R1 = "/Volumes/lacie/models-lacie/unsloth/DeepSeek-V3.1-GGUF/DeepSeek-V3.1-UD-Q5_K_XL.gguf"
PATH_V3 = "/Volumes/lacie/models-lacie/unsloth/DeepSeek-V3.1-GGUF/DeepSeek-V3.1-UD-Q6_K_XL.gguf"
PATH_OUT = "DeepSeek-V3.1-UD-Q6Q5_K_XL.gguf"

Define key mapping priority

KEY_MAPPING = {
"token_embd": "v3",
"blk.0": "v3",
"blk.1": "v3",
"blk.2": "v3",
"shexp": "v3",
"exps": "r1",
"attn": "v3",
"ffn_gate_inp": "v3",
#"blk.4.ffn_up_exps": "r1",
#"blk.4.ffn_gate_exps": "r1",
#"blk.4.ffn_down_exps": "r1",
#"blk.6.ffn_up_exps": "r1",
#"blk.6.ffn_gate_exps": "r1",
#"blk.6.ffn_down_exps": "r1",
#"blk.8.ffn_up_exps": "r1",
#"blk.8.ffn_gate_exps": "r1",
#"blk.8.ffn_down_exps": "r1",
#"blk.33.ffn_up_exps": "r1",
#"blk.33.ffn_gate_exps": "r1",
#"blk.33.ffn_down_exps": "r1",
#"blk.59.ffn_up_exps": "r1",
#"blk.59.ffn_gate_exps": "r1",
#"blk.59.ffn_down_exps": "r1",
#"blk.60.ffn_up_exps": "r1",
#"blk.60.ffn_gate_exps": "r1",
#"blk.60.ffn_down_exps": "r1"
}

def get_field_data(reader, key):
"""Extract data from a GGUF field with correct type interpretation."""
try:
field = reader.fields.get(key)
if not field:
return None

    if not hasattr(field, 'types') or not field.types:
        return None
    
    main_type = field.types[0]
    
    # Scalar types
    if main_type in {
        gguf.GGUFValueType.UINT8, gguf.GGUFValueType.INT8,
        gguf.GGUFValueType.UINT16, gguf.GGUFValueType.INT16,
        gguf.GGUFValueType.UINT32, gguf.GGUFValueType.INT32,
        gguf.GGUFValueType.UINT64, ggumerf.GGUFValueType.INT64,
        gguf.GGUFValueType.FLOAT32, gguf.GGUFValueType.FLOAT64,
        gguf.GGUFValueType.BOOL
    }:
        if hasattr(field, 'parts') and len(field.parts) > 0:
            val = field.parts[-1]
            if hasattr(val, 'item'):
                return val.item()
            else:
                return val
        return None
    
    # String type
    if main_type == gguf.GGUFValueType.STRING:
        if hasattr(field, 'parts') and len(field.parts) > 0:
            val = field.parts[-1]
            return val.tobytes().decode('utf-8', errors='replace')
        return None
    
    # Array type
    if main_type == gguf.GGUFValueType.ARRAY:
        if len(field.types) < 2:
            return None
        
        arr_type = field.types[1]
        arr_data = []
        
        # The array data is stored in parts after the first two (type and length)
        for part in getattr(field, 'parts', [])[2:]:
            if arr_type == gguf.GGUFValueType.STRING:
                arr_data.append(part.tobytes().decode('utf-8', errors='replace'))
            elif hasattr(part, 'item'):
                arr_data.append(part.item())
            else:
                arr_data.append(part)
        
        return (arr_type, arr_data)
    
    # Unsupported types
    return None
except Exception as e:
    print(f"Error extracting data for {key}: {e}")
    return None

def ensure_token_dictionaries_consistent(writer):
"""Ensure id_to_token and token_to_id dictionaries have matching sizes."""
tokens = get_field_data(writer, "tokenizer.ggml.tokens")
if not tokens or not isinstance(tokens, tuple) or len(tokens) < 2:
return False

token_type, token_data = tokens

# Check for duplicates
seen = set()
duplicates = []

for idx, token in enumerate(token_data):
    if token in seen:
        duplicates.append((idx, token))
    seen.add(token)

if duplicates:
    print(f"Warning: Found {len(duplicates)} duplicate tokens")
    # Potentially fix the duplicates here
    
return len(duplicates) == 0

def get_field_data(reader, key):
"""Get field data from GGUF reader."""
field = reader.get_field(key)
return field.contents() if field else None

def copy_metadata(reader, writer):
"""Copy metadata from reader to writer with proper array type handling."""
added = 0
skipped = 0

for key in reader.fields:
    # Only skip GGUF internal fields, NOT tokenizer fields
    if key.startswith('GGUF.') and key not in {"GGUF.version", "GGUF.tensor_count", "GGUF.kv_count"}:
        print(f"Skipping internal GGUF field: {key}")
        skipped += 1
        continue
        
    try:
        field = reader.fields[key]
        value = get_field_data(reader, key)
        
        if value is None:
            print(f"Skipping {key}: no data")
            skipped += 1
            continue
        
        main_type = field.types[0]
        
        # Handle arrays with proper subtype detection
        if main_type == gguf.GGUFValueType.ARRAY:
            # Get the array subtype from the field metadata
            if len(field.types) > 1:
                subtype = field.types[1]
            else:
                # Fallback: determine subtype from data
                if isinstance(value, list) and len(value) > 0:
                    first_item = value[0]
                    if isinstance(first_item, str):
                        subtype = gguf.GGUFValueType.STRING
                    elif isinstance(first_item, int):
                        subtype = gguf.GGUFValueType.INT32
                    elif isinstance(first_item, float):
                        subtype = gguf.GGUFValueType.FLOAT32
                    else:
                        print(f"Cannot determine subtype for array {key}")
                        skipped += 1
                        continue
                else:
                    print(f"Empty or invalid array data for {key}")
                    skipped += 1
                    continue
            
            # Handle different array subtypes
            if subtype == gguf.GGUFValueType.STRING:
                # String arrays: general.tags, general.languages, tokenizer fields
                if isinstance(value, list):
                    # Clean string array data
                    cleaned_array = []
                    for i, item in enumerate(value):
                        if item is None or item == "":
                            # Handle empty strings in tokenizer data
                            if key == "tokenizer.ggml.tokens":
                                cleaned_array.append(f"<empty_{i}>")
                            else:
                                cleaned_array.append("")
                        else:
                            cleaned_array.append(str(item))
                    writer.add_array(key, cleaned_array)
                    print(f"Copied string array {key} with {len(cleaned_array)} items")
                else:
                    print(f"Invalid string array format for {key}: {type(value)}")
                    skipped += 1
                    continue
                    
            elif subtype in [gguf.GGUFValueType.INT32, gguf.GGUFValueType.UINT32, 
                           gguf.GGUFValueType.INT8, gguf.GGUFValueType.UINT8]:
                # Integer arrays: tokenizer.ggml.token_type
                if isinstance(value, list):
                    int_array = [int(item) for item in value]
                    writer.add_array(key, int_array)
                    print(f"Copied integer array {key} with {len(int_array)} items")
                else:
                    print(f"Invalid integer array format for {key}: {type(value)}")
                    skipped += 1
                    continue
                    
            elif subtype in [gguf.GGUFValueType.FLOAT32, gguf.GGUFValueType.FLOAT64]:
                # Float arrays: tokenizer.ggml.scores
                if isinstance(value, list):
                    float_array = [float(item) for item in value]
                    writer.add_array(key, float_array)
                    print(f"Copied float array {key} with {len(float_array)} items")
                else:
                    print(f"Invalid float array format for {key}: {type(value)}")
                    skipped += 1
                    continue
                    
            else:
                print(f"Unsupported array subtype {subtype} for {key}")
                skipped += 1
                continue
                
        # Handle scalar types
        elif main_type == gguf.GGUFValueType.STRING:
            writer.add_string(key, str(value))
        elif main_type == gguf.GGUFValueType.UINT16:
            writer.add_uint16(key, int(value))
        elif main_type == gguf.GGUFValueType.INT16:
            writer.add_int16(key, int(value))
        elif main_type == gguf.GGUFValueType.UINT32:
            writer.add_uint32(key, int(value))
        elif main_type == gguf.GGUFValueType.INT32:
            writer.add_int32(key, int(value))
        elif main_type == gguf.GGUFValueType.FLOAT32:
            writer.add_float32(key, float(value))
        elif main_type == gguf.GGUFValueType.BOOL:
            writer.add_bool(key, bool(value))
        elif main_type == gguf.GGUFValueType.UINT64:
            writer.add_uint64(key, int(value))
        elif main_type == gguf.GGUFValueType.INT64:
            writer.add_int64(key, int(value))
        elif main_type == gguf.GGUFValueType.FLOAT64:
            writer.add_float64(key, float(value))
        else:
            print(f"Skipping unsupported type {main_type} for {key}")
            skipped += 1
            continue
        
        added += 1
        
    except Exception as e:
        print(f"Error copying {key}: {e}")
        import traceback
        traceback.print_exc()
        skipped += 1

print(f"Copied {added} metadata fields, skipped {skipped}")

def copy_tokenizer_metadata(reader, writer):
"""Legacy function - now handled by copy_metadata."""
print("Tokenizer metadata is now handled by the main copy_metadata function")
pass

def compare_tensor_sets(tensors_r1, tensors_v3):
"""Compare tensor keys between models."""
r1_keys = set(tensors_r1.keys())
v3_keys = set(tensors_v3.keys())
common = r1_keys & v3_keys
only_in_r1 = r1_keys - v3_keys
only_in_v3 = v3_keys - r1_keys

print(f"\nTensor comparison:")
print(f" Common tensors: {len(common)}")
print(f" Only in R1: {len(only_in_r1)}")
print(f" Only in V3: {len(only_in_v3)}")

return common, only_in_r1, only_in_v3

def dump_metadata(reader, prefix="", sample_count=10):
"""Print a summary of the model's metadata."""
print(f"\n{prefix} Metadata Summary:")
all_keys = sorted(reader.fields.keys())
type_counts = {}
for key in all_keys:
field = reader.fields[key]
tname = str(field.types) if hasattr(field, 'types') else 'unknown'
type_counts[tname] = type_counts.get(tname, 0) + 1

print(f" Total fields: {len(all_keys)}")
for tname, count in type_counts.items():
    print(f" {tname}: {count}")

print(f"\nSample fields ({min(sample_count, len(all_keys))}):")
for key in all_keys[:sample_count]:
    try:
        value = get_field_data(reader, key)
        if isinstance(value, (list, tuple)) and len(value) > 5:
            val_str = f"{value[:5]}... (len={len(value)})"
        else:
            val_str = str(value)
        if len(val_str) > 50:
            val_str = val_str[:50] + "..."
        print(f" {key}: {val_str}")
    except Exception as e:
        print(f" {key}: (Error: {e})")

def merged_model_merger():
"""Main function combining best metadata and tensor handling."""
start_time = time.time()

print(f"Starting merge of {PATH_R1} and {PATH_V3}...")
print("Loading model readers...")

reader_r1 = gguf.GGUFReader(PATH_R1)
reader_v3 = gguf.GGUFReader(PATH_V3)

# Display metadata summary
dump_metadata(reader_r1, "R1 Model")
dump_metadata(reader_v3, "V3 Model")

# Index tensors
print("\nIndexing tensors...")
tensors_r1 = {t.name: t for t in reader_r1.tensors}
tensors_v3 = {t.name: t for t in reader_v3.tensors}

print(f"Found {len(tensors_r1)} tensors in R1")
print(f"Found {len(tensors_v3)} tensors in V3")

# Compare tensors
common, r1_only, v3_only = compare_tensor_sets(tensors_r1, tensors_v3)

all_names = sorted(set(tensors_r1.keys()) | set(tensors_v3.keys()))
print(f"\nTotal unique tensors: {len(all_names)}")

# Determine source for each tensor
tensor_sources = {}
for name in all_names:
    if name in r1_only:
        tensor_sources[name] = "r1"
    elif name in v3_only:
        tensor_sources[name] = "v3"
    else:
        # Default to v3 unless specified in mapping
        tensor_sources[name] = "v3"

# Apply mapping rules
for name in all_names:
    for k, v in KEY_MAPPING.items():
        if k in name:
            tensor_sources[name] = v
            break

# Count tensors from each source
r1_count = sum(1 for s in tensor_sources.values() if s == "r1")
v3_count = sum(1 for s in tensor_sources.values() if s == "v3")

print(f"\nUsing {r1_count} tensors from R1 and {v3_count} from V3.")

# Display file sizes
r1_size = os.path.getsize(PATH_R1) if os.path.exists(PATH_R1) else 0
v3_size = os.path.getsize(PATH_V3) if os.path.exists(PATH_V3) else 0

print(f"R1 size: {r1_size / (1024**3):.2f} GB")
print(f"V3 size: {v3_size / (1024**3):.2f} GB")

# Create output writer
writer = gguf.GGUFWriter(PATH_OUT, arch="deepseek2")

# Copy metadata with the fixed function that properly handles all types
print("\nCopying V3 metadata...")
copy_metadata(reader_v3, writer)

print("\nSpecifically handling tokenizer metadata...")
# Choose which model's tokenizer to use - typically use V3 as your base
copy_tokenizer_metadata(reader_v3, writer)

# Add tensor info
print("\nAdding tensor info...")

r1_tensor_total_size = 0  # Track R1 tensor size
v3_tensor_total_size = 0  # Track V3 tensor size

tensor_list = []

for name in all_names:
    src = tensor_sources[name]
    tensor = None
    
    if src == "r1" and name in tensors_r1:
        tensor = tensors_r1[name]
        r1_tensor_total_size += tensor.data.nbytes
    elif src == "v3" and name in tensors_v3:
        tensor = tensors_v3[name]
        v3_tensor_total_size += tensor.data.nbytes
    else:
        print(f"Warning: tensor {name} not found in source {src}")
        continue
    
    # Store tensor for writing
    tensor_list.append((name, tensor))
    
    # Add tensor info with detailed attributes
    writer.add_tensor_info(
        name,
        tensor.data.shape,
        tensor.data.dtype,
        tensor.data.nbytes,
        tensor.tensor_type
    )

# Write header and tensor info to initialize the file
print(f"\nWriting header with {len(tensor_list)} tensors...")
writer.write_header_to_file()
writer.write_kv_data_to_file()
writer.write_ti_data_to_file()

# Calculate total tensor size for progress bar
total_tensor_bytes = r1_tensor_total_size + v3_tensor_total_size

print(f"\nWriting tensor data ({total_tensor_bytes / (1024**3):.2f} GB total)...")
print(f" - From R1: {r1_tensor_total_size / (1024**3):.2f} GB")
print(f" - From V3: {v3_tensor_total_size / (1024**3):.2f} GB")

# Write tensor data with progress bar
bar = tqdm(total=total_tensor_bytes, unit='B', unit_scale=True)

for name, tensor in tensor_list:
    writer.write_tensor_data(tensor.data)
    bar.update(tensor.data.nbytes)
    
    # Memory management
    del tensor
    gc.collect()

# Clean up the tensor list to free memory
del tensor_list
gc.collect()

print("\nValidating tokenizer consistency...")
tokens_valid = validate_tokens(writer)
dictionaries_consistent = ensure_token_dictionaries_consistent(writer)

if not tokens_valid or not dictionaries_consistent:
    print("Warning: Tokenizer validation failed - model may not load correctly")

bar.close()
writer.close()

print(f"\nMerge completed in {time.time() - start_time:.2f} seconds")
print(f"Output file: {PATH_OUT}")

# Validation step
print("\nValidating output file...")
try:
    out_reader = gguf.GGUFReader(PATH_OUT)
    print(f"Output model has {len(out_reader.fields)} metadata fields and {len(out_reader.tensors)} tensors")
    print("Validation successful!")
    return True
except Exception as e:
    print(f"Validation error: {e}")
    return False

Main execution

if name == "main":
try:
merged_model_merger()
except Exception as e:
print(f"Error during merge: {e}")
import traceback
traceback.print_exc()
sys.exit(1)

@mmbela
Thank you!

  • I did not know about gguf_editor_gui.py from llama.cpp. The right tools help a lot

  • You are very kind to share your code. It works very well to merge the quants. I see you fill the 512 GB memory almost to the max with the new merged model!

  • The only bit that does not work yet for me is

print("\nValidating tokenizer consistency...")
tokens_valid = validate_tokens(writer)
dictionaries_consistent = ensure_token_dictionaries_consistent(writer)

It does not find the validate_tokens(writer) and the ensure_token_dictionaries_consistent(writer) code. I just commented it out for now as it is used to check the model after it has been created.

You have been very helpful and kind with sharing your knowledge and code. I learned several new things from you.

I mentioned that it seems possible to create similar mixed quants using MLX.

In the mlx-ml/convert.py you can replace the content of def mixed_quant_predicate with the following code:

use_bits = 8        
#if "lm_head" in path:
#   use_bits = 8
if "tokens" in path:
   use_bits = 6
#if "attn.kv" in path:
#   use_bits = 8
if "o_proj" in path:
   use_bits = 6
if "attn.q" in path:
   use_bits = 6
# For all mlp (block 0-2) and shared experts (58 layers/9
#if "down_proj" in path:
#   use_bits = 8
if "up_proj" in path:
   use_bits = 6
if "gate_proj" in path:
   use_bits = 6
# Exceptions for switch experts
if "switch_mlp.down_proj" in path:
   use_bits = 6
   # block 3 and 4 have these as Q8
   if "3.mlp.switch_mlp.down_proj" in path:
      use_bits = 8
   if "4.mlp.switch_mlp.down_proj" in path:
      use_bits = 8
if "switch_mlp.up_proj" in path:
   use_bits = 5
if "switch_mlp.gate_proj" in path:
   use_bits = 5
        
return {"group_size": group_size, "bits": use_bits}

This works with DeepSeek V3-0324, R1-0529, and V3.1

By looking at the layers in your DeepSeek-V3.1-UD-Q6Q5_K_XL.gguf model, you can create a similar model in MLX. Perhaps this will help as MLX is sometimes a bit faster on the Mac?

bibproj changed discussion status to closed

I am uploading a MLX DQ3_K_M version of Kimi K2 Instruct 0905 at https://huggingface.co/mlx-community/Kimi-K2-Instruct-0905-mlx-DQ3_K_M. On the README.md (model card) I also shared how to build it. It is specifically for the 512 GB Mac Studio.

Kimi K2 is built almost the same as DeepSeek. Your python code should also work for this model.

Sign up or log in to comment