Update README.md

e5f38e8 verified 5 days ago

9.79 kB

	---
	license: apache-2.0
	language:
	- en
	- zh
	base_model:
	- Qwen/Qwen2.5-14B
	- Qwen/Qwen2.5-14B-Instruct
	- Qwen/Qwen2.5-14B-Instruct-1M
	- Qwen/Qwen2.5-Coder-14B
	- Qwen/Qwen2.5-Coder-14B-Instruct
	- Azure99/Blossom-V6-14B
	- arcee-ai/SuperNova-Medius
	- arcee-ai/Virtuoso-Small-v2
	- deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
	- huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated-v2
	pipeline_tag: text-generation
	tags:
	- merge
	model-index:
	- name: ZYH-LLM-Qwen2.5-14B-V4
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: IFEval (0-Shot)
	type: HuggingFaceH4/ifeval
	args:
	num_few_shot: 0
	metrics:
	- type: inst_level_strict_acc and prompt_level_strict_acc
	value: 83.65
	name: strict accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=YOYO-AI/ZYH-LLM-Qwen2.5-14B-V4
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: BBH (3-Shot)
	type: BBH
	args:
	num_few_shot: 3
	metrics:
	- type: acc_norm
	value: 50.27
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=YOYO-AI/ZYH-LLM-Qwen2.5-14B-V4
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MATH Lvl 5 (4-Shot)
	type: hendrycks/competition_math
	args:
	num_few_shot: 4
	metrics:
	- type: exact_match
	value: 53.93
	name: exact match
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=YOYO-AI/ZYH-LLM-Qwen2.5-14B-V4
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GPQA (0-shot)
	type: Idavidrein/gpqa
	args:
	num_few_shot: 0
	metrics:
	- type: acc_norm
	value: 8.61
	name: acc_norm
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=YOYO-AI/ZYH-LLM-Qwen2.5-14B-V4
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MuSR (0-shot)
	type: TAUR-Lab/MuSR
	args:
	num_few_shot: 0
	metrics:
	- type: acc_norm
	value: 15.66
	name: acc_norm
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=YOYO-AI/ZYH-LLM-Qwen2.5-14B-V4
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU-PRO (5-shot)
	type: TIGER-Lab/MMLU-Pro
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 46.71
	name: accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=YOYO-AI/ZYH-LLM-Qwen2.5-14B-V4
	name: Open LLM Leaderboard
	---
	![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/64e174e202fa032de4143324/CpkVlkXWV0_9Qnz0nDIP4.jpeg)
	# ZYH-LLM-Qwen2.5-14B-V4
	The fourth-generation model of ZYH-LLM-Qwen2.5 has been released!

	Increase the proportion of the R1 distillation model* in the model merging recipe while maintaining the model's instruction-following ability and general capabilities.*

	## Merge Template

	```yaml
	merge_method: model_stock
	base_model: Instruction Model
	models:
	- model: Instruction Fine-tuning Model 1
	- model: Instruction Fine-tuning Model 2
	- model: Inference Fine-tuning Model 1
	- model: Inference Fine-tuning Model 2
	dtype: bfloat16
	tokenizer_source: base
	int8_mask: true
	normalize: true
	```
	Using the above template for merging can improve the calculation accuracy and inference ability of the model without reducing the general capabilities of the instruction model.

	ZYH-LLM-Qwen2.5-V4 used this template during the model merging process.

	## [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/YOYO-AI__ZYH-LLM-Qwen2.5-14B-V4-details)

	\| Metric \|Value\|
	\|-------------------\|----:\|
	\|Avg. \|43.14\|
	\|IFEval (0-Shot) \|83.65\|
	\|BBH (3-Shot) \|50.27\|
	\|MATH Lvl 5 (4-Shot)\|53.93\|
	\|GPQA (0-shot) \|8.61\|
	\|MuSR (0-shot) \|15.66\|
	\|MMLU-PRO (5-shot) \|46.71\|

	## First stage:
	Create four different instruction models and code model
	```yaml
	models:
	- model: Qwen/Qwen2.5-14B-Instruct
	parameters:
	density: 1
	weight: 1
	lambda: 0.9
	- model: Qwen/Qwen2.5-14B-Instruct-1M
	parameters:
	density: 1
	weight: 1
	lambda: 0.9
	merge_method: della
	base_model: Qwen/Qwen2.5-14B
	parameters:
	density: 1
	weight: 1
	lambda: 0.9
	normalize: true
	int8_mask: true
	dtype: bfloat16
	tokenizer_source: base
	name: Qwen2.5-14B-della-base
	```
	```yaml
	models:
	- model: Qwen/Qwen2.5-14B-Instruct
	parameters:
	density: 1
	weight: 1
	lambda: 0.9
	- model: Qwen/Qwen2.5-14B-Instruct-1M
	parameters:
	density: 1
	weight: 1
	lambda: 0.9
	merge_method: della
	base_model: arcee-ai/Virtuoso-Small-v2
	parameters:
	density: 1
	weight: 1
	lambda: 0.9
	normalize: true
	int8_mask: true
	dtype: bfloat16
	tokenizer_source: base
	name: Qwen2.5-14B-della-v2
	```
	```yaml
	models:
	- model: Qwen/Qwen2.5-14B-Instruct
	parameters:
	density: 1
	weight: 1
	lambda: 0.9
	- model: Qwen/Qwen2.5-14B-Instruct-1M
	parameters:
	density: 1
	weight: 1
	lambda: 0.9
	merge_method: della
	base_model: arcee-ai/SuperNova-Medius
	parameters:
	density: 1
	weight: 1
	lambda: 0.9
	normalize: true
	int8_mask: true
	dtype: bfloat16
	tokenizer_source: base
	name: Qwen2.5-14B-della-Nova
	```
	```yaml
	models:
	- model: Qwen/Qwen2.5-14B-Instruct
	parameters:
	density: 1
	weight: 1
	lambda: 0.9
	- model: Qwen/Qwen2.5-14B-Instruct-1M
	parameters:
	density: 1
	weight: 1
	lambda: 0.9
	merge_method: della
	base_model: Azure99/Blossom-V6-14B
	parameters:
	density: 1
	weight: 1
	lambda: 0.9
	normalize: true
	int8_mask: true
	dtype: bfloat16
	tokenizer_source: base
	name: Qwen2.5-14B-della-V6
	```
	```yaml
	models:
	- model: Qwen/Qwen2.5-Coder-14B-Instruct
	parameters:
	density: 1
	weight: 1
	lambda: 0.9
	merge_method: della
	base_model: Qwen/Qwen2.5-Coder-14B
	parameters:
	density: 1
	weight: 1
	lambda: 0.9
	normalize: true
	int8_mask: true
	dtype: bfloat16
	tokenizer_source: base
	name: Qwen2.5-Coder-14B-della
	```
	## Second stage:

	### Step 1:
	Create three instruction models with a bias towards reasoning by using templates.
	```yaml
	merge_method: model_stock
	base_model: Qwen2.5-14B-della-base
	models:
	- model: Qwen2.5-Coder-14B-della
	- model: Qwen2.5-14B-della-v2
	- model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
	- model: huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated-v2
	dtype: bfloat16
	tokenizer_source: base
	int8_mask: true
	normalize: true
	name: Qwen2.5-14B-mst-Coder
	```
	```yaml
	merge_method: model_stock
	base_model: Qwen2.5-14B-della-base
	models:
	- model: Qwen2.5-14B-della-V6
	- model: Qwen2.5-14B-della-v2
	- model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
	- model: huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated-v2
	dtype: bfloat16
	tokenizer_source: base
	int8_mask: true
	normalize: true
	name: Qwen2.5-14B-mst-V6
	```
	```yaml
	merge_method: model_stock
	base_model: Qwen2.5-14B-della-base
	models:
	- model: Qwen2.5-14B-della-Nova
	- model: Qwen2.5-14B-della-v2
	- model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
	- model: huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated-v2
	dtype: bfloat16
	tokenizer_source: base
	int8_mask: true
	normalize: true
	name: Qwen2.5-14B-mst-Nova
	```
	### Step 2:
	Create a pure instruction model to restore the generality of the final model.
	```yaml
	merge_method: model_stock
	base_model: Qwen2.5-14B-della-base
	models:
	- model: Qwen2.5-14B-della-Nova
	- model: Qwen2.5-14B-della-v2
	- model: Qwen2.5-14B-della-V6
	dtype: bfloat16
	tokenizer_source: base
	int8_mask: true
	normalize: true
	name: Qwen2.5-14B-mst-it
	```
	## Third stage:
	Create a base model with a context of 1 million tokens.
	```yaml
	merge_method: sce
	models:
	# Pivot model
	- model: Qwen/Qwen2.5-14B-Instruct-1M
	# Target models
	- model: Qwen/Qwen2.5-14B
	base_model: Qwen/Qwen2.5-14B-Instruct-1M
	parameters:
	select_topk: 1
	dtype: bfloat16
	tokenizer_source: base
	normalize: true
	int8_mask: true
	name: Qwen2.5-14B-1M
	```
	```yaml
	models:
	- model: Qwen/Qwen2.5-14B-Instruct
	parameters:
	density: 1
	weight: 1
	lambda: 0.9
	- model: Qwen/Qwen2.5-14B-Instruct-1M
	parameters:
	density: 1
	weight: 1
	lambda: 0.9
	merge_method: della
	base_model: Qwen2.5-14B-1M
	parameters:
	density: 1
	weight: 1
	lambda: 0.9
	normalize: true
	int8_mask: true
	dtype: bfloat16
	tokenizer_source: base
	name: Qwen2.5-14B-della-1M
	```
	## Final stage:

	```yaml
	merge_method: model_stock
	base_model: Qwen2.5-14B-della-1M
	models:
	- model: Qwen2.5-14B-mst-Coder
	- model: Qwen2.5-14B-mst-V6
	- model: Qwen2.5-14B-mst-Nova
	- model: Qwen2.5-14B-mst-it
	dtype: bfloat16
	tokenizer_source: base
	int8_mask: true
	normalize: true
	name: ZYH-LLM-Qwen2.5-14B-V4
	```