metadata

license: apache-2.0
language:
  - en
  - zh
base_model:
  - Qwen/Qwen2.5-14B
  - Qwen/Qwen2.5-14B-Instruct
  - Qwen/Qwen2.5-14B-Instruct-1M
  - Qwen/Qwen2.5-Coder-14B
  - Qwen/Qwen2.5-Coder-14B-Instruct
  - Azure99/Blossom-V6-14B
  - arcee-ai/SuperNova-Medius
  - arcee-ai/Virtuoso-Small-v2
  - deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
  - huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated-v2
pipeline_tag: text-generation
tags:
  - merge
model-index:
  - name: ZYH-LLM-Qwen2.5-14B-V4
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: IFEval (0-Shot)
          type: HuggingFaceH4/ifeval
          args:
            num_few_shot: 0
        metrics:
          - type: inst_level_strict_acc and prompt_level_strict_acc
            value: 83.65
            name: strict accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=YOYO-AI/ZYH-LLM-Qwen2.5-14B-V4
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: BBH (3-Shot)
          type: BBH
          args:
            num_few_shot: 3
        metrics:
          - type: acc_norm
            value: 50.27
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=YOYO-AI/ZYH-LLM-Qwen2.5-14B-V4
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MATH Lvl 5 (4-Shot)
          type: hendrycks/competition_math
          args:
            num_few_shot: 4
        metrics:
          - type: exact_match
            value: 53.93
            name: exact match
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=YOYO-AI/ZYH-LLM-Qwen2.5-14B-V4
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GPQA (0-shot)
          type: Idavidrein/gpqa
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 8.61
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=YOYO-AI/ZYH-LLM-Qwen2.5-14B-V4
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MuSR (0-shot)
          type: TAUR-Lab/MuSR
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 15.66
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=YOYO-AI/ZYH-LLM-Qwen2.5-14B-V4
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU-PRO (5-shot)
          type: TIGER-Lab/MMLU-Pro
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 46.71
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=YOYO-AI/ZYH-LLM-Qwen2.5-14B-V4
          name: Open LLM Leaderboard

ZYH-LLM-Qwen2.5-14B-V4

The fourth-generation model of ZYH-LLM-Qwen2.5 has been released!

Increase the proportion of the R1 distillation model in the model merging recipe while maintaining the model's instruction-following ability and general capabilities.

Merge Template

merge_method: model_stock  
base_model: Instruction Model  
models:  
  - model: Instruction Fine-tuning Model 1  
  - model: Instruction Fine-tuning Model 2  
  - model: Inference Fine-tuning Model 1  
  - model: Inference Fine-tuning Model 2  
dtype: bfloat16  
tokenizer_source: base  
int8_mask: true  
normalize: true

Using the above template for merging can improve the calculation accuracy and inference ability of the model without reducing the general capabilities of the instruction model.

ZYH-LLM-Qwen2.5-V4 used this template during the model merging process.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	43.14
IFEval (0-Shot)	83.65
BBH (3-Shot)	50.27
MATH Lvl 5 (4-Shot)	53.93
GPQA (0-shot)	8.61
MuSR (0-shot)	15.66
MMLU-PRO (5-shot)	46.71

First stage:

Create four different instruction models and code model

models:  
  - model: Qwen/Qwen2.5-14B-Instruct  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9
  - model: Qwen/Qwen2.5-14B-Instruct-1M  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9  
merge_method: della  
base_model: Qwen/Qwen2.5-14B  
parameters:  
  density: 1  
  weight: 1  
  lambda: 0.9  
  normalize: true  
  int8_mask: true  
dtype: bfloat16  
tokenizer_source: base  
name: Qwen2.5-14B-della-base

models:  
  - model: Qwen/Qwen2.5-14B-Instruct  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9
  - model: Qwen/Qwen2.5-14B-Instruct-1M  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9  
merge_method: della  
base_model: arcee-ai/Virtuoso-Small-v2  
parameters:  
  density: 1  
  weight: 1  
  lambda: 0.9  
  normalize: true  
  int8_mask: true  
dtype: bfloat16  
tokenizer_source: base  
name: Qwen2.5-14B-della-v2

models:  
  - model: Qwen/Qwen2.5-14B-Instruct  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9
  - model: Qwen/Qwen2.5-14B-Instruct-1M  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9  
merge_method: della  
base_model: arcee-ai/SuperNova-Medius  
parameters:  
  density: 1  
  weight: 1  
  lambda: 0.9  
  normalize: true  
  int8_mask: true  
dtype: bfloat16  
tokenizer_source: base  
name: Qwen2.5-14B-della-Nova

models:  
  - model: Qwen/Qwen2.5-14B-Instruct  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9
  - model: Qwen/Qwen2.5-14B-Instruct-1M  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9  
merge_method: della  
base_model: Azure99/Blossom-V6-14B  
parameters:  
  density: 1  
  weight: 1  
  lambda: 0.9  
  normalize: true  
  int8_mask: true  
dtype: bfloat16  
tokenizer_source: base  
name: Qwen2.5-14B-della-V6

models:  
  - model: Qwen/Qwen2.5-Coder-14B-Instruct  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9  
merge_method: della  
base_model: Qwen/Qwen2.5-Coder-14B  
parameters:  
  density: 1  
  weight: 1  
  lambda: 0.9  
  normalize: true  
  int8_mask: true  
dtype: bfloat16  
tokenizer_source: base  
name: Qwen2.5-Coder-14B-della

Second stage:

Step 1:

Create three instruction models with a bias towards reasoning by using templates.

merge_method: model_stock  
base_model: Qwen2.5-14B-della-base  
models:  
  - model: Qwen2.5-Coder-14B-della  
  - model: Qwen2.5-14B-della-v2  
  - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B  
  - model: huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated-v2  
dtype: bfloat16  
tokenizer_source: base  
int8_mask: true  
normalize: true  
name: Qwen2.5-14B-mst-Coder

merge_method: model_stock  
base_model: Qwen2.5-14B-della-base  
models:  
  - model: Qwen2.5-14B-della-V6  
  - model: Qwen2.5-14B-della-v2  
  - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B  
  - model: huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated-v2  
dtype: bfloat16  
tokenizer_source: base  
int8_mask: true  
normalize: true  
name: Qwen2.5-14B-mst-V6

merge_method: model_stock  
base_model: Qwen2.5-14B-della-base  
models:  
  - model: Qwen2.5-14B-della-Nova  
  - model: Qwen2.5-14B-della-v2  
  - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B  
  - model: huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated-v2  
dtype: bfloat16  
tokenizer_source: base  
int8_mask: true  
normalize: true  
name: Qwen2.5-14B-mst-Nova

Step 2:

Create a pure instruction model to restore the generality of the final model.

merge_method: model_stock  
base_model: Qwen2.5-14B-della-base  
models:  
  - model: Qwen2.5-14B-della-Nova  
  - model: Qwen2.5-14B-della-v2  
  - model: Qwen2.5-14B-della-V6   
dtype: bfloat16  
tokenizer_source: base  
int8_mask: true  
normalize: true  
name: Qwen2.5-14B-mst-it

Third stage:

Create a base model with a context of 1 million tokens.

merge_method: sce  
models:
  # Pivot model
  - model: Qwen/Qwen2.5-14B-Instruct-1M
  # Target models  
  - model: Qwen/Qwen2.5-14B  
base_model: Qwen/Qwen2.5-14B-Instruct-1M  
parameters:  
  select_topk: 1  
dtype: bfloat16  
tokenizer_source: base  
normalize: true  
int8_mask: true  
name: Qwen2.5-14B-1M

models:  
  - model: Qwen/Qwen2.5-14B-Instruct  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9
  - model: Qwen/Qwen2.5-14B-Instruct-1M  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9  
merge_method: della  
base_model: Qwen2.5-14B-1M  
parameters:  
  density: 1  
  weight: 1  
  lambda: 0.9  
  normalize: true  
  int8_mask: true  
dtype: bfloat16  
tokenizer_source: base  
name: Qwen2.5-14B-della-1M

Final stage:

merge_method: model_stock  
base_model: Qwen2.5-14B-della-1M  
models:  
  - model: Qwen2.5-14B-mst-Coder  
  - model: Qwen2.5-14B-mst-V6  
  - model: Qwen2.5-14B-mst-Nova  
  - model: Qwen2.5-14B-mst-it  
dtype: bfloat16  
tokenizer_source: base  
int8_mask: true  
normalize: true  
name: ZYH-LLM-Qwen2.5-14B-V4