vllm (pretrained=/root/autodl-tmp/Homunculus,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true,gpu_memory_utilization=0.5), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.796 | ± | 0.0255 |
strict-match | 5 | exact_match | ↑ | 0.796 | ± | 0.0255 |
vllm (pretrained=/root/autodl-tmp/Homunculus,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true,gpu_memory_utilization=0.5), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.796 | ± | 0.0180 |
strict-match | 5 | exact_match | ↑ | 0.792 | ± | 0.0182 |
vllm (pretrained=/root/autodl-tmp/Homunculus,add_bos_token=true,max_model_len=3048,dtype=bfloat16,trust_remote_code=true,gpu_memory_utilization=0.5), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto
Groups | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
mmlu | 2 | none | acc | ↑ | 0.6480 | ± | 0.0153 | |
- humanities | 2 | none | acc | ↑ | 0.6769 | ± | 0.0306 | |
- other | 2 | none | acc | ↑ | 0.6718 | ± | 0.0330 | |
- social sciences | 2 | none | acc | ↑ | 0.7444 | ± | 0.0315 | |
- stem | 2 | none | acc | ↑ | 0.5509 | ± | 0.0275 |
vllm (pretrained=/root/autodl-tmp/Homunculus-90-128-4096-9.9999,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true,gpu_memory_utilization=0.4), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.796 | ± | 0.0255 |
strict-match | 5 | exact_match | ↑ | 0.796 | ± | 0.0255 |
vllm (pretrained=/root/autodl-tmp/Homunculus-90-128-4096-9.9999,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true,gpu_memory_utilization=0.4), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.796 | ± | 0.0255 |
strict-match | 5 | exact_match | ↑ | 0.796 | ± | 0.0255 |
vllm (pretrained=/root/autodl-tmp/Homunculus-90-128-4096-9.9999,add_bos_token=true,max_model_len=3048,dtype=bfloat16,trust_remote_code=true,gpu_memory_utilization=0.4), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto
Groups | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
mmlu | 2 | none | acc | ↑ | 0.6538 | ± | 0.0152 | |
- humanities | 2 | none | acc | ↑ | 0.6872 | ± | 0.0301 | |
- other | 2 | none | acc | ↑ | 0.6769 | ± | 0.0322 | |
- social sciences | 2 | none | acc | ↑ | 0.7389 | ± | 0.0314 | |
- stem | 2 | none | acc | ↑ | 0.5614 | ± | 0.0277 |
- Downloads last month
- 8