noneUsername/cogito-v1-preview-qwen-32B-awq

vllm (pretrained=/root/autodl-tmp/cogito-v1-preview-qwen-32B,add_bos_token=true,max_model_len=3096,dtype=bfloat16,tensor_parallel_size=4), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.900	±	0.0190
		strict-match	5	exact_match	↑	0.948	±	0.0141

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.894	±	0.0138
		strict-match	5	exact_match	↑	0.930	±	0.0114

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.8947	±	0.0175
- humanities	2	none	acc	↑	0.9231	±	0.0308
- other	2	none	acc	↑	0.8769	±	0.0407
- social sciences	2	none	acc	↑	0.9167	±	0.0354
- stem	2	none	acc	↑	0.8737	±	0.0324

vllm (pretrained=/root/autodl-tmp/cogito-v1-preview-qwen-32B-awq,add_bos_token=true,max_model_len=3096,dtype=bfloat16,tensor_parallel_size=4), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.924	±	0.0168
		strict-match	5	exact_match	↑	0.936	±	0.0155

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.920	±	0.0121
		strict-match	5	exact_match	↑	0.934	±	0.0111

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.8982	±	0.0170
- humanities	2	none	acc	↑	0.8769	±	0.0377
- other	2	none	acc	↑	0.8769	±	0.0407
- social sciences	2	none	acc	↑	0.9500	±	0.0289
- stem	2	none	acc	↑	0.8947	±	0.0288

noneUsername
/

cogito-v1-preview-qwen-32B-awq

Model tree for noneUsername/cogito-v1-preview-qwen-32B-awq