Pinkstack commited on
Commit
6903392
·
verified ·
1 Parent(s): 6760346

Adding Evaluation Results

Browse files

This is an automated PR created with [this space](https://huggingface.co/spaces/T145/open-llm-leaderboard-results-to-modelcard)!

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

Please report any issues here: https://huggingface.co/spaces/T145/open-llm-leaderboard-results-to-modelcard/discussions

Files changed (1) hide show
  1. README.md +114 -1
README.md CHANGED
@@ -15,6 +15,105 @@ license: apache-2.0
15
  language:
16
  - en
17
  pipeline_tag: text-generation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  ---
19
  ![superthoughtslight.png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/2LuPB_ZPCGni3-PyCkL0-.png)
20
  # Information
@@ -51,4 +150,18 @@ Generated inside the android application, Pocketpal via GGUF Q8, using the model
51
  - **License:** apache-2.0
52
  - **Finetuned from model :** HuggingFaceTB/SmolLM2-1.7B-Instruct
53
 
54
- This smollm2 model was trained with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  language:
16
  - en
17
  pipeline_tag: text-generation
18
+ model-index:
19
+ - name: Superthoughts-lite-1.8B-experimental-o1
20
+ results:
21
+ - task:
22
+ type: text-generation
23
+ name: Text Generation
24
+ dataset:
25
+ name: IFEval (0-Shot)
26
+ type: wis-k/instruction-following-eval
27
+ split: train
28
+ args:
29
+ num_few_shot: 0
30
+ metrics:
31
+ - type: inst_level_strict_acc and prompt_level_strict_acc
32
+ value: 3.75
33
+ name: averaged accuracy
34
+ source:
35
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Pinkstack%2FSuperthoughts-lite-1.8B-experimental-o1
36
+ name: Open LLM Leaderboard
37
+ - task:
38
+ type: text-generation
39
+ name: Text Generation
40
+ dataset:
41
+ name: BBH (3-Shot)
42
+ type: SaylorTwift/bbh
43
+ split: test
44
+ args:
45
+ num_few_shot: 3
46
+ metrics:
47
+ - type: acc_norm
48
+ value: 9.13
49
+ name: normalized accuracy
50
+ source:
51
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Pinkstack%2FSuperthoughts-lite-1.8B-experimental-o1
52
+ name: Open LLM Leaderboard
53
+ - task:
54
+ type: text-generation
55
+ name: Text Generation
56
+ dataset:
57
+ name: MATH Lvl 5 (4-Shot)
58
+ type: lighteval/MATH-Hard
59
+ split: test
60
+ args:
61
+ num_few_shot: 4
62
+ metrics:
63
+ - type: exact_match
64
+ value: 1.06
65
+ name: exact match
66
+ source:
67
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Pinkstack%2FSuperthoughts-lite-1.8B-experimental-o1
68
+ name: Open LLM Leaderboard
69
+ - task:
70
+ type: text-generation
71
+ name: Text Generation
72
+ dataset:
73
+ name: GPQA (0-shot)
74
+ type: Idavidrein/gpqa
75
+ split: train
76
+ args:
77
+ num_few_shot: 0
78
+ metrics:
79
+ - type: acc_norm
80
+ value: 3.36
81
+ name: acc_norm
82
+ source:
83
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Pinkstack%2FSuperthoughts-lite-1.8B-experimental-o1
84
+ name: Open LLM Leaderboard
85
+ - task:
86
+ type: text-generation
87
+ name: Text Generation
88
+ dataset:
89
+ name: MuSR (0-shot)
90
+ type: TAUR-Lab/MuSR
91
+ args:
92
+ num_few_shot: 0
93
+ metrics:
94
+ - type: acc_norm
95
+ value: 1.76
96
+ name: acc_norm
97
+ source:
98
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Pinkstack%2FSuperthoughts-lite-1.8B-experimental-o1
99
+ name: Open LLM Leaderboard
100
+ - task:
101
+ type: text-generation
102
+ name: Text Generation
103
+ dataset:
104
+ name: MMLU-PRO (5-shot)
105
+ type: TIGER-Lab/MMLU-Pro
106
+ config: main
107
+ split: test
108
+ args:
109
+ num_few_shot: 5
110
+ metrics:
111
+ - type: acc
112
+ value: 9.45
113
+ name: accuracy
114
+ source:
115
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Pinkstack%2FSuperthoughts-lite-1.8B-experimental-o1
116
+ name: Open LLM Leaderboard
117
  ---
118
  ![superthoughtslight.png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/2LuPB_ZPCGni3-PyCkL0-.png)
119
  # Information
 
150
  - **License:** apache-2.0
151
  - **Finetuned from model :** HuggingFaceTB/SmolLM2-1.7B-Instruct
152
 
153
+ This smollm2 model was trained with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
154
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
155
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/Pinkstack__Superthoughts-lite-1.8B-experimental-o1-details)!
156
+ Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=Pinkstack%2FSuperthoughts-lite-1.8B-experimental-o1&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!
157
+
158
+ | Metric |Value (%)|
159
+ |-------------------|--------:|
160
+ |**Average** | 4.75|
161
+ |IFEval (0-Shot) | 3.75|
162
+ |BBH (3-Shot) | 9.13|
163
+ |MATH Lvl 5 (4-Shot)| 1.06|
164
+ |GPQA (0-shot) | 3.36|
165
+ |MuSR (0-shot) | 1.76|
166
+ |MMLU-PRO (5-shot) | 9.45|
167
+