|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
- zh |
|
base_model: |
|
- Qwen/Qwen2.5-14B |
|
- Qwen/Qwen2.5-14B-Instruct |
|
- Qwen/Qwen2.5-14B-Instruct-1M |
|
- Qwen/Qwen2.5-Coder-14B |
|
- Qwen/Qwen2.5-Coder-14B-Instruct |
|
- Azure99/Blossom-V6-14B |
|
- arcee-ai/SuperNova-Medius |
|
- arcee-ai/Virtuoso-Small-v2 |
|
- deepseek-ai/DeepSeek-R1-Distill-Qwen-14B |
|
- huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated-v2 |
|
pipeline_tag: text-generation |
|
tags: |
|
- merge |
|
model-index: |
|
- name: ZYH-LLM-Qwen2.5-14B-V4 |
|
results: |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: IFEval (0-Shot) |
|
type: HuggingFaceH4/ifeval |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: inst_level_strict_acc and prompt_level_strict_acc |
|
value: 83.65 |
|
name: strict accuracy |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=YOYO-AI/ZYH-LLM-Qwen2.5-14B-V4 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: BBH (3-Shot) |
|
type: BBH |
|
args: |
|
num_few_shot: 3 |
|
metrics: |
|
- type: acc_norm |
|
value: 50.27 |
|
name: normalized accuracy |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=YOYO-AI/ZYH-LLM-Qwen2.5-14B-V4 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MATH Lvl 5 (4-Shot) |
|
type: hendrycks/competition_math |
|
args: |
|
num_few_shot: 4 |
|
metrics: |
|
- type: exact_match |
|
value: 53.93 |
|
name: exact match |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=YOYO-AI/ZYH-LLM-Qwen2.5-14B-V4 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: GPQA (0-shot) |
|
type: Idavidrein/gpqa |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: acc_norm |
|
value: 8.61 |
|
name: acc_norm |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=YOYO-AI/ZYH-LLM-Qwen2.5-14B-V4 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MuSR (0-shot) |
|
type: TAUR-Lab/MuSR |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: acc_norm |
|
value: 15.66 |
|
name: acc_norm |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=YOYO-AI/ZYH-LLM-Qwen2.5-14B-V4 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MMLU-PRO (5-shot) |
|
type: TIGER-Lab/MMLU-Pro |
|
config: main |
|
split: test |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 46.71 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=YOYO-AI/ZYH-LLM-Qwen2.5-14B-V4 |
|
name: Open LLM Leaderboard |
|
--- |
|
 |
|
# ZYH-LLM-Qwen2.5-14B-V4 |
|
*The fourth-generation model of ZYH-LLM-Qwen2.5 has been released!* |
|
|
|
*Increase the proportion of the **R1 distillation model** in the model merging recipe while maintaining the model's **instruction-following ability** and **general capabilities.*** |
|
|
|
## Merge Template |
|
|
|
```yaml |
|
merge_method: model_stock |
|
base_model: Instruction Model |
|
models: |
|
- model: Instruction Fine-tuning Model 1 |
|
- model: Instruction Fine-tuning Model 2 |
|
- model: Inference Fine-tuning Model 1 |
|
- model: Inference Fine-tuning Model 2 |
|
dtype: bfloat16 |
|
tokenizer_source: base |
|
int8_mask: true |
|
normalize: true |
|
``` |
|
Using the above template for merging can improve the **calculation accuracy** and **inference ability** of the model without reducing the **general capabilities** of the instruction model. |
|
|
|
**ZYH-LLM-Qwen2.5-V4** used this template during the model merging process. |
|
|
|
## [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) |
|
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/YOYO-AI__ZYH-LLM-Qwen2.5-14B-V4-details) |
|
|
|
| Metric |Value| |
|
|-------------------|----:| |
|
|Avg. |43.14| |
|
|IFEval (0-Shot) |83.65| |
|
|BBH (3-Shot) |50.27| |
|
|MATH Lvl 5 (4-Shot)|53.93| |
|
|GPQA (0-shot) |8.61| |
|
|MuSR (0-shot) |15.66| |
|
|MMLU-PRO (5-shot) |46.71| |
|
|
|
## First stage: |
|
*Create four different instruction models and code model* |
|
```yaml |
|
models: |
|
- model: Qwen/Qwen2.5-14B-Instruct |
|
parameters: |
|
density: 1 |
|
weight: 1 |
|
lambda: 0.9 |
|
- model: Qwen/Qwen2.5-14B-Instruct-1M |
|
parameters: |
|
density: 1 |
|
weight: 1 |
|
lambda: 0.9 |
|
merge_method: della |
|
base_model: Qwen/Qwen2.5-14B |
|
parameters: |
|
density: 1 |
|
weight: 1 |
|
lambda: 0.9 |
|
normalize: true |
|
int8_mask: true |
|
dtype: bfloat16 |
|
tokenizer_source: base |
|
name: Qwen2.5-14B-della-base |
|
``` |
|
```yaml |
|
models: |
|
- model: Qwen/Qwen2.5-14B-Instruct |
|
parameters: |
|
density: 1 |
|
weight: 1 |
|
lambda: 0.9 |
|
- model: Qwen/Qwen2.5-14B-Instruct-1M |
|
parameters: |
|
density: 1 |
|
weight: 1 |
|
lambda: 0.9 |
|
merge_method: della |
|
base_model: arcee-ai/Virtuoso-Small-v2 |
|
parameters: |
|
density: 1 |
|
weight: 1 |
|
lambda: 0.9 |
|
normalize: true |
|
int8_mask: true |
|
dtype: bfloat16 |
|
tokenizer_source: base |
|
name: Qwen2.5-14B-della-v2 |
|
``` |
|
```yaml |
|
models: |
|
- model: Qwen/Qwen2.5-14B-Instruct |
|
parameters: |
|
density: 1 |
|
weight: 1 |
|
lambda: 0.9 |
|
- model: Qwen/Qwen2.5-14B-Instruct-1M |
|
parameters: |
|
density: 1 |
|
weight: 1 |
|
lambda: 0.9 |
|
merge_method: della |
|
base_model: arcee-ai/SuperNova-Medius |
|
parameters: |
|
density: 1 |
|
weight: 1 |
|
lambda: 0.9 |
|
normalize: true |
|
int8_mask: true |
|
dtype: bfloat16 |
|
tokenizer_source: base |
|
name: Qwen2.5-14B-della-Nova |
|
``` |
|
```yaml |
|
models: |
|
- model: Qwen/Qwen2.5-14B-Instruct |
|
parameters: |
|
density: 1 |
|
weight: 1 |
|
lambda: 0.9 |
|
- model: Qwen/Qwen2.5-14B-Instruct-1M |
|
parameters: |
|
density: 1 |
|
weight: 1 |
|
lambda: 0.9 |
|
merge_method: della |
|
base_model: Azure99/Blossom-V6-14B |
|
parameters: |
|
density: 1 |
|
weight: 1 |
|
lambda: 0.9 |
|
normalize: true |
|
int8_mask: true |
|
dtype: bfloat16 |
|
tokenizer_source: base |
|
name: Qwen2.5-14B-della-V6 |
|
``` |
|
```yaml |
|
models: |
|
- model: Qwen/Qwen2.5-Coder-14B-Instruct |
|
parameters: |
|
density: 1 |
|
weight: 1 |
|
lambda: 0.9 |
|
merge_method: della |
|
base_model: Qwen/Qwen2.5-Coder-14B |
|
parameters: |
|
density: 1 |
|
weight: 1 |
|
lambda: 0.9 |
|
normalize: true |
|
int8_mask: true |
|
dtype: bfloat16 |
|
tokenizer_source: base |
|
name: Qwen2.5-Coder-14B-della |
|
``` |
|
## Second stage: |
|
|
|
### Step 1: |
|
*Create three instruction models with a bias towards reasoning by using templates.* |
|
```yaml |
|
merge_method: model_stock |
|
base_model: Qwen2.5-14B-della-base |
|
models: |
|
- model: Qwen2.5-Coder-14B-della |
|
- model: Qwen2.5-14B-della-v2 |
|
- model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B |
|
- model: huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated-v2 |
|
dtype: bfloat16 |
|
tokenizer_source: base |
|
int8_mask: true |
|
normalize: true |
|
name: Qwen2.5-14B-mst-Coder |
|
``` |
|
```yaml |
|
merge_method: model_stock |
|
base_model: Qwen2.5-14B-della-base |
|
models: |
|
- model: Qwen2.5-14B-della-V6 |
|
- model: Qwen2.5-14B-della-v2 |
|
- model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B |
|
- model: huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated-v2 |
|
dtype: bfloat16 |
|
tokenizer_source: base |
|
int8_mask: true |
|
normalize: true |
|
name: Qwen2.5-14B-mst-V6 |
|
``` |
|
```yaml |
|
merge_method: model_stock |
|
base_model: Qwen2.5-14B-della-base |
|
models: |
|
- model: Qwen2.5-14B-della-Nova |
|
- model: Qwen2.5-14B-della-v2 |
|
- model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B |
|
- model: huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated-v2 |
|
dtype: bfloat16 |
|
tokenizer_source: base |
|
int8_mask: true |
|
normalize: true |
|
name: Qwen2.5-14B-mst-Nova |
|
``` |
|
### Step 2: |
|
*Create a pure instruction model to restore the generality of the final model.* |
|
```yaml |
|
merge_method: model_stock |
|
base_model: Qwen2.5-14B-della-base |
|
models: |
|
- model: Qwen2.5-14B-della-Nova |
|
- model: Qwen2.5-14B-della-v2 |
|
- model: Qwen2.5-14B-della-V6 |
|
dtype: bfloat16 |
|
tokenizer_source: base |
|
int8_mask: true |
|
normalize: true |
|
name: Qwen2.5-14B-mst-it |
|
``` |
|
## Third stage: |
|
*Create a base model with a context of 1 million tokens.* |
|
```yaml |
|
merge_method: sce |
|
models: |
|
# Pivot model |
|
- model: Qwen/Qwen2.5-14B-Instruct-1M |
|
# Target models |
|
- model: Qwen/Qwen2.5-14B |
|
base_model: Qwen/Qwen2.5-14B-Instruct-1M |
|
parameters: |
|
select_topk: 1 |
|
dtype: bfloat16 |
|
tokenizer_source: base |
|
normalize: true |
|
int8_mask: true |
|
name: Qwen2.5-14B-1M |
|
``` |
|
```yaml |
|
models: |
|
- model: Qwen/Qwen2.5-14B-Instruct |
|
parameters: |
|
density: 1 |
|
weight: 1 |
|
lambda: 0.9 |
|
- model: Qwen/Qwen2.5-14B-Instruct-1M |
|
parameters: |
|
density: 1 |
|
weight: 1 |
|
lambda: 0.9 |
|
merge_method: della |
|
base_model: Qwen2.5-14B-1M |
|
parameters: |
|
density: 1 |
|
weight: 1 |
|
lambda: 0.9 |
|
normalize: true |
|
int8_mask: true |
|
dtype: bfloat16 |
|
tokenizer_source: base |
|
name: Qwen2.5-14B-della-1M |
|
``` |
|
## Final stage: |
|
|
|
```yaml |
|
merge_method: model_stock |
|
base_model: Qwen2.5-14B-della-1M |
|
models: |
|
- model: Qwen2.5-14B-mst-Coder |
|
- model: Qwen2.5-14B-mst-V6 |
|
- model: Qwen2.5-14B-mst-Nova |
|
- model: Qwen2.5-14B-mst-it |
|
dtype: bfloat16 |
|
tokenizer_source: base |
|
int8_mask: true |
|
normalize: true |
|
name: ZYH-LLM-Qwen2.5-14B-V4 |
|
``` |