noneUsername/Forgotten-Transgression-24B-v4.1-W8A8

vllm (pretrained=/root/autodl-tmp/Forgotten-Transgression-24B-v4.1,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.924	±	0.0168
		strict-match	5	exact_match	↑	0.920	±	0.0172

vllm (pretrained=/root/autodl-tmp/Forgotten-Transgression-24B-v4.1,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.922	±	0.0120
		strict-match	5	exact_match	↑	0.914	±	0.0126

vllm (pretrained=/root/autodl-tmp/Forgotten-Transgression-24B-v4.1,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.8012	±	0.0129
- humanities	2	none	acc	↑	0.8205	±	0.0253
- other	2	none	acc	↑	0.8103	±	0.0269
- social sciences	2	none	acc	↑	0.8667	±	0.0246
- stem	2	none	acc	↑	0.7404	±	0.0248

vllm (pretrained=/root/autodl-tmp/85-512,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.908	±	0.0183
		strict-match	5	exact_match	↑	0.904	±	0.0187

vllm (pretrained=/root/autodl-tmp/85-512,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.902	±	0.0133
		strict-match	5	exact_match	↑	0.894	±	0.0138

vllm (pretrained=/root/autodl-tmp/85-512,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.8000	±	0.0130
- humanities	2	none	acc	↑	0.8205	±	0.0255
- other	2	none	acc	↑	0.8051	±	0.0276
- social sciences	2	none	acc	↑	0.8667	±	0.0246
- stem	2	none	acc	↑	0.7404	±	0.0247

vllm (pretrained=/root/autodl-tmp/857-512,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.904	±	0.0187
		strict-match	5	exact_match	↑	0.904	±	0.0187

vllm (pretrained=/root/autodl-tmp/857-512,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.896	±	0.0137
		strict-match	5	exact_match	↑	0.892	±	0.0139

vllm (pretrained=/root/autodl-tmp/857-512,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.7942	±	0.0130
- humanities	2	none	acc	↑	0.8308	±	0.0254
- other	2	none	acc	↑	0.7897	±	0.0278
- social sciences	2	none	acc	↑	0.8667	±	0.0245
- stem	2	none	acc	↑	0.7263	±	0.0249

vllm (pretrained=/root/autodl-tmp/86-512,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.916	±	0.0176
		strict-match	5	exact_match	↑	0.912	±	0.0180

vllm (pretrained=/root/autodl-tmp/86-512,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.914	±	0.0126
		strict-match	5	exact_match	↑	0.902	±	0.0133

vllm (pretrained=/root/autodl-tmp/86-512,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.8012	±	0.0129
- humanities	2	none	acc	↑	0.8462	±	0.0244
- other	2	none	acc	↑	0.8051	±	0.0269
- social sciences	2	none	acc	↑	0.8667	±	0.0249
- stem	2	none	acc	↑	0.7263	±	0.0253

vllm (pretrained=/root/autodl-tmp/863-512,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.908	±	0.0183
		strict-match	5	exact_match	↑	0.896	±	0.0193

vllm (pretrained=/root/autodl-tmp/863-512,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.906	±	0.0131
		strict-match	5	exact_match	↑	0.892	±	0.0139

vllm (pretrained=/root/autodl-tmp/863-512,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.7988	±	0.0130
- humanities	2	none	acc	↑	0.8308	±	0.0259
- other	2	none	acc	↑	0.8205	±	0.0261
- social sciences	2	none	acc	↑	0.8611	±	0.0244
- stem	2	none	acc	↑	0.7228	±	0.0253

noneUsername
/

Forgotten-Transgression-24B-v4.1-W8A8

Model tree for noneUsername/Forgotten-Transgression-24B-v4.1-W8A8