Update README.md

93e405c verified 3 days ago

4.95 kB

	---
	base_model:
	- grimjim/HuatuoSkywork-o1-Llama-3.1-8B
	- VAGOsolutions/Llama-3.1-SauerkrautLM-8b-Instruct
	library_name: transformers
	pipeline_tag: text-generation
	tags:
	- mergekit
	- merge
	license: llama3.1
	model-index:
	- name: SauerHuatuoSkywork-o1-Llama-3.1-8B
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: IFEval (0-Shot)
	type: wis-k/instruction-following-eval
	split: train
	args:
	num_few_shot: 0
	metrics:
	- type: inst_level_strict_acc and prompt_level_strict_acc
	value: 52.19
	name: averaged accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=grimjim%2FSauerHuatuoSkywork-o1-Llama-3.1-8B
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: BBH (3-Shot)
	type: SaylorTwift/bbh
	split: test
	args:
	num_few_shot: 3
	metrics:
	- type: acc_norm
	value: 32.09
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=grimjim%2FSauerHuatuoSkywork-o1-Llama-3.1-8B
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MATH Lvl 5 (4-Shot)
	type: lighteval/MATH-Hard
	split: test
	args:
	num_few_shot: 4
	metrics:
	- type: exact_match
	value: 16.99
	name: exact match
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=grimjim%2FSauerHuatuoSkywork-o1-Llama-3.1-8B
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GPQA (0-shot)
	type: Idavidrein/gpqa
	split: train
	args:
	num_few_shot: 0
	metrics:
	- type: acc_norm
	value: 9.51
	name: acc_norm
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=grimjim%2FSauerHuatuoSkywork-o1-Llama-3.1-8B
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MuSR (0-shot)
	type: TAUR-Lab/MuSR
	args:
	num_few_shot: 0
	metrics:
	- type: acc_norm
	value: 15.79
	name: acc_norm
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=grimjim%2FSauerHuatuoSkywork-o1-Llama-3.1-8B
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU-PRO (5-shot)
	type: TIGER-Lab/MMLU-Pro
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 33.23
	name: accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=grimjim%2FSauerHuatuoSkywork-o1-Llama-3.1-8B
	name: Open LLM Leaderboard
	---
	# SauerHuatuoSkywork-o1-Llama-3.1-8B

	This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).

	An experiment to hybridize a relatively high scoring Llama 3.1 8B model with o1 reasoning capabilities.

	Although IFEval benched lower than the SauerkrautLM mode, every other benchmark improved from the addition of the o1 merge at low weight.

	Made with Llama.

	## Merge Details
	### Merge Method

	This model was merged using the SLERP merge method.

	### Models Merged

	The following models were included in the merge:
	* [grimjim/HuatuoSkywork-o1-Llama-3.1-8B](https://huggingface.co/grimjim/HuatuoSkywork-o1-Llama-3.1-8B)
	* [VAGOsolutions/Llama-3.1-SauerkrautLM-8b-Instruct](https://huggingface.co/VAGOsolutions/Llama-3.1-SauerkrautLM-8b-Instruct)

	### Configuration

	The following YAML configuration was used to produce this model:

	```yaml
	models:
	- model: grimjim/HuatuoSkywork-o1-Llama-3.1-8B
	- model: VAGOsolutions/Llama-3.1-SauerkrautLM-8b-Instruct
	merge_method: slerp
	base_model: grimjim/HuatuoSkywork-o1-Llama-3.1-8B
	parameters:
	t:
	- value: 0.96
	dtype: bfloat16

	```

	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/grimjim__SauerHuatuoSkywork-o1-Llama-3.1-8B-details)!
	Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=grimjim%2FSauerHuatuoSkywork-o1-Llama-3.1-8B&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!

	\| Metric \|Value (%)\|
	\|-------------------\|--------:\|
	\|Average \| 26.63\|
	\|IFEval (0-Shot) \| 52.19\|
	\|BBH (3-Shot) \| 32.09\|
	\|MATH Lvl 5 (4-Shot)\| 16.99\|
	\|GPQA (0-shot) \| 9.51\|
	\|MuSR (0-shot) \| 15.79\|
	\|MMLU-PRO (5-shot) \| 33.23\|