--- license: apache-2.0 language: - en - zh base_model: - Qwen/Qwen2.5-14B - Qwen/Qwen2.5-14B-Instruct - Qwen/Qwen2.5-14B-Instruct-1M - Qwen/Qwen2.5-Coder-14B - Qwen/Qwen2.5-Coder-14B-Instruct - Azure99/Blossom-V6-14B - arcee-ai/SuperNova-Medius - arcee-ai/Virtuoso-Small-v2 - deepseek-ai/DeepSeek-R1-Distill-Qwen-14B - huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated-v2 pipeline_tag: text-generation tags: - merge model-index: - name: ZYH-LLM-Qwen2.5-14B-V4 results: - task: type: text-generation name: Text Generation dataset: name: IFEval (0-Shot) type: HuggingFaceH4/ifeval args: num_few_shot: 0 metrics: - type: inst_level_strict_acc and prompt_level_strict_acc value: 83.65 name: strict accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=YOYO-AI/ZYH-LLM-Qwen2.5-14B-V4 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: BBH (3-Shot) type: BBH args: num_few_shot: 3 metrics: - type: acc_norm value: 50.27 name: normalized accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=YOYO-AI/ZYH-LLM-Qwen2.5-14B-V4 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MATH Lvl 5 (4-Shot) type: hendrycks/competition_math args: num_few_shot: 4 metrics: - type: exact_match value: 53.93 name: exact match source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=YOYO-AI/ZYH-LLM-Qwen2.5-14B-V4 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GPQA (0-shot) type: Idavidrein/gpqa args: num_few_shot: 0 metrics: - type: acc_norm value: 8.61 name: acc_norm source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=YOYO-AI/ZYH-LLM-Qwen2.5-14B-V4 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MuSR (0-shot) type: TAUR-Lab/MuSR args: num_few_shot: 0 metrics: - type: acc_norm value: 15.66 name: acc_norm source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=YOYO-AI/ZYH-LLM-Qwen2.5-14B-V4 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU-PRO (5-shot) type: TIGER-Lab/MMLU-Pro config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 46.71 name: accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=YOYO-AI/ZYH-LLM-Qwen2.5-14B-V4 name: Open LLM Leaderboard --- ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/64e174e202fa032de4143324/CpkVlkXWV0_9Qnz0nDIP4.jpeg) # ZYH-LLM-Qwen2.5-14B-V4 *The fourth-generation model of ZYH-LLM-Qwen2.5 has been released!* *Increase the proportion of the **R1 distillation model** in the model merging recipe while maintaining the model's **instruction-following ability** and **general capabilities.*** ## Merge Template ```yaml merge_method: model_stock base_model: Instruction Model models: - model: Instruction Fine-tuning Model 1 - model: Instruction Fine-tuning Model 2 - model: Inference Fine-tuning Model 1 - model: Inference Fine-tuning Model 2 dtype: bfloat16 tokenizer_source: base int8_mask: true normalize: true ``` Using the above template for merging can improve the **calculation accuracy** and **inference ability** of the model without reducing the **general capabilities** of the instruction model. **ZYH-LLM-Qwen2.5-V4** used this template during the model merging process. ## [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/YOYO-AI__ZYH-LLM-Qwen2.5-14B-V4-details) | Metric |Value| |-------------------|----:| |Avg. |43.14| |IFEval (0-Shot) |83.65| |BBH (3-Shot) |50.27| |MATH Lvl 5 (4-Shot)|53.93| |GPQA (0-shot) |8.61| |MuSR (0-shot) |15.66| |MMLU-PRO (5-shot) |46.71| ## First stage: *Create four different instruction models and code model* ```yaml models: - model: Qwen/Qwen2.5-14B-Instruct parameters: density: 1 weight: 1 lambda: 0.9 - model: Qwen/Qwen2.5-14B-Instruct-1M parameters: density: 1 weight: 1 lambda: 0.9 merge_method: della base_model: Qwen/Qwen2.5-14B parameters: density: 1 weight: 1 lambda: 0.9 normalize: true int8_mask: true dtype: bfloat16 tokenizer_source: base name: Qwen2.5-14B-della-base ``` ```yaml models: - model: Qwen/Qwen2.5-14B-Instruct parameters: density: 1 weight: 1 lambda: 0.9 - model: Qwen/Qwen2.5-14B-Instruct-1M parameters: density: 1 weight: 1 lambda: 0.9 merge_method: della base_model: arcee-ai/Virtuoso-Small-v2 parameters: density: 1 weight: 1 lambda: 0.9 normalize: true int8_mask: true dtype: bfloat16 tokenizer_source: base name: Qwen2.5-14B-della-v2 ``` ```yaml models: - model: Qwen/Qwen2.5-14B-Instruct parameters: density: 1 weight: 1 lambda: 0.9 - model: Qwen/Qwen2.5-14B-Instruct-1M parameters: density: 1 weight: 1 lambda: 0.9 merge_method: della base_model: arcee-ai/SuperNova-Medius parameters: density: 1 weight: 1 lambda: 0.9 normalize: true int8_mask: true dtype: bfloat16 tokenizer_source: base name: Qwen2.5-14B-della-Nova ``` ```yaml models: - model: Qwen/Qwen2.5-14B-Instruct parameters: density: 1 weight: 1 lambda: 0.9 - model: Qwen/Qwen2.5-14B-Instruct-1M parameters: density: 1 weight: 1 lambda: 0.9 merge_method: della base_model: Azure99/Blossom-V6-14B parameters: density: 1 weight: 1 lambda: 0.9 normalize: true int8_mask: true dtype: bfloat16 tokenizer_source: base name: Qwen2.5-14B-della-V6 ``` ```yaml models: - model: Qwen/Qwen2.5-Coder-14B-Instruct parameters: density: 1 weight: 1 lambda: 0.9 merge_method: della base_model: Qwen/Qwen2.5-Coder-14B parameters: density: 1 weight: 1 lambda: 0.9 normalize: true int8_mask: true dtype: bfloat16 tokenizer_source: base name: Qwen2.5-Coder-14B-della ``` ## Second stage: ### Step 1: *Create three instruction models with a bias towards reasoning by using templates.* ```yaml merge_method: model_stock base_model: Qwen2.5-14B-della-base models: - model: Qwen2.5-Coder-14B-della - model: Qwen2.5-14B-della-v2 - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B - model: huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated-v2 dtype: bfloat16 tokenizer_source: base int8_mask: true normalize: true name: Qwen2.5-14B-mst-Coder ``` ```yaml merge_method: model_stock base_model: Qwen2.5-14B-della-base models: - model: Qwen2.5-14B-della-V6 - model: Qwen2.5-14B-della-v2 - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B - model: huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated-v2 dtype: bfloat16 tokenizer_source: base int8_mask: true normalize: true name: Qwen2.5-14B-mst-V6 ``` ```yaml merge_method: model_stock base_model: Qwen2.5-14B-della-base models: - model: Qwen2.5-14B-della-Nova - model: Qwen2.5-14B-della-v2 - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B - model: huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated-v2 dtype: bfloat16 tokenizer_source: base int8_mask: true normalize: true name: Qwen2.5-14B-mst-Nova ``` ### Step 2: *Create a pure instruction model to restore the generality of the final model.* ```yaml merge_method: model_stock base_model: Qwen2.5-14B-della-base models: - model: Qwen2.5-14B-della-Nova - model: Qwen2.5-14B-della-v2 - model: Qwen2.5-14B-della-V6 dtype: bfloat16 tokenizer_source: base int8_mask: true normalize: true name: Qwen2.5-14B-mst-it ``` ## Third stage: *Create a base model with a context of 1 million tokens.* ```yaml merge_method: sce models: # Pivot model - model: Qwen/Qwen2.5-14B-Instruct-1M # Target models - model: Qwen/Qwen2.5-14B base_model: Qwen/Qwen2.5-14B-Instruct-1M parameters: select_topk: 1 dtype: bfloat16 tokenizer_source: base normalize: true int8_mask: true name: Qwen2.5-14B-1M ``` ```yaml models: - model: Qwen/Qwen2.5-14B-Instruct parameters: density: 1 weight: 1 lambda: 0.9 - model: Qwen/Qwen2.5-14B-Instruct-1M parameters: density: 1 weight: 1 lambda: 0.9 merge_method: della base_model: Qwen2.5-14B-1M parameters: density: 1 weight: 1 lambda: 0.9 normalize: true int8_mask: true dtype: bfloat16 tokenizer_source: base name: Qwen2.5-14B-della-1M ``` ## Final stage: ```yaml merge_method: model_stock base_model: Qwen2.5-14B-della-1M models: - model: Qwen2.5-14B-mst-Coder - model: Qwen2.5-14B-mst-V6 - model: Qwen2.5-14B-mst-Nova - model: Qwen2.5-14B-mst-it dtype: bfloat16 tokenizer_source: base int8_mask: true normalize: true name: ZYH-LLM-Qwen2.5-14B-V4 ```