noneUsername/Homunculus-W8A8

vllm (pretrained=/root/autodl-tmp/Homunculus,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true,gpu_memory_utilization=0.5), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.796	±	0.0255
		strict-match	5	exact_match	↑	0.796	±	0.0255

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.796	±	0.0180
		strict-match	5	exact_match	↑	0.792	±	0.0182

vllm (pretrained=/root/autodl-tmp/Homunculus,add_bos_token=true,max_model_len=3048,dtype=bfloat16,trust_remote_code=true,gpu_memory_utilization=0.5), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.6480	±	0.0153
- humanities	2	none	acc	↑	0.6769	±	0.0306
- other	2	none	acc	↑	0.6718	±	0.0330
- social sciences	2	none	acc	↑	0.7444	±	0.0315
- stem	2	none	acc	↑	0.5509	±	0.0275

vllm (pretrained=/root/autodl-tmp/Homunculus-90-128-4096-9.9999,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true,gpu_memory_utilization=0.4), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.796	±	0.0255
		strict-match	5	exact_match	↑	0.796	±	0.0255

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.796	±	0.0255
		strict-match	5	exact_match	↑	0.796	±	0.0255

vllm (pretrained=/root/autodl-tmp/Homunculus-90-128-4096-9.9999,add_bos_token=true,max_model_len=3048,dtype=bfloat16,trust_remote_code=true,gpu_memory_utilization=0.4), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.6538	±	0.0152
- humanities	2	none	acc	↑	0.6872	±	0.0301
- other	2	none	acc	↑	0.6769	±	0.0322
- social sciences	2	none	acc	↑	0.7389	±	0.0314
- stem	2	none	acc	↑	0.5614	±	0.0277

noneUsername
/

Homunculus-W8A8

Model tree for noneUsername/Homunculus-W8A8