vllm (pretrained=/root/autodl-tmp/DeepCoder-14B-Preview,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.732 ± 0.0281
strict-match 5 exact_match 0.856 ± 0.0222

vllm (pretrained=/root/autodl-tmp/DeepCoder-14B-Preview,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.766 ± 0.0190
strict-match 5 exact_match 0.856 ± 0.0157

vllm (pretrained=/root/autodl-tmp/DeepCoder-14B-Preview,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1

Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc 0.7345 ± 0.0139
- humanities 2 none acc 0.7333 ± 0.0283
- other 2 none acc 0.7385 ± 0.0295
- social sciences 2 none acc 0.8000 ± 0.0285
- stem 2 none acc 0.6912 ± 0.0254

vllm (pretrained=/root/autodl-tmp/80-256,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.768 ± 0.0268
strict-match 5 exact_match 0.868 ± 0.0215

vllm (pretrained=/root/autodl-tmp/80-256,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.764 ± 0.0190
strict-match 5 exact_match 0.884 ± 0.0143

vllm (pretrained=/root/autodl-tmp/80-256,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1

Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc 0.7345 ± 0.0139
- humanities 2 none acc 0.7179 ± 0.0287
- other 2 none acc 0.7538 ± 0.0287
- social sciences 2 none acc 0.8167 ± 0.0275
- stem 2 none acc 0.6807 ± 0.0257
Downloads last month
4
Safetensors
Model size
14.8B params
Tensor type
F32
·
I8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for noneUsername/DeepCoder-14B-Preview-W8A8