Dhia-GB commited on
Commit
82c2a73
·
verified ·
1 Parent(s): 2b13d94

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -5
README.md CHANGED
@@ -8,19 +8,21 @@ tags:
8
  - falcon3
9
  ---
10
 
11
- # Falcon3-7B-Base
12
 
13
  **Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.
14
 
15
- This repository contains the **Falcon3-3B-Base**. It achieves strong results on reasoning, language understanding, instruction following, code and mathematics tasks.
16
- Falcon3-3B-Base supports 4 languages (english, french, spanish, portuguese) and a context length up to 8K.
17
- Falcon3-3B-Base pruned (depth + width) from Falcon3-7B-Base, was effeciently trained on only 100 GT using a knowledge distillation objective.
 
 
18
 
19
  ⚠️ **This is a raw, pretrained model, which should be further finetuned for most usecases.**
20
 
21
  ## Model Details
22
  - Architecture
23
- - Transformer based causal decoder only architecture
24
  - 22 decoder blocks
25
  - Grouped query attention (GQA) for faster inference: 12 query heads and 4 KV heads
26
  - Wider head dimension: 256
@@ -69,6 +71,7 @@ We report in the following table our internal pipeline benchmarks:
69
  <col style="width: 7%;">
70
  <col style="width: 7%;">
71
  <col style="width: 7%;">
 
72
  <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
73
  </colgroup>
74
  <thead>
@@ -77,6 +80,7 @@ We report in the following table our internal pipeline benchmarks:
77
  <th>Benchmark</th>
78
  <th>Llama3.2-3B</th>
79
  <th>Qwen2.5-3B</th>
 
80
  <th>Minitron-4B</th>
81
  <th>Falcon3-3B-Base</th>
82
  </tr>
@@ -87,6 +91,7 @@ We report in the following table our internal pipeline benchmarks:
87
  <td>MMLU (5-shot)</td>
88
  <td>56.1</td>
89
  <td>65.6</td>
 
90
  <td>58.6</td>
91
  <td>55.5</td>
92
  </tr>
@@ -94,6 +99,7 @@ We report in the following table our internal pipeline benchmarks:
94
  <td>MMLU-PRO (5-shot)</td>
95
  <td>24.9</td>
96
  <td>31.99</td>
 
97
  <td>26.21</td>
98
  <td>28.77</td>
99
  </tr>
@@ -101,6 +107,7 @@ We report in the following table our internal pipeline benchmarks:
101
  <td>IFEval</td>
102
  <td>12.83</td>
103
  <td>27</td>
 
104
  <td>22.81</td>
105
  <td>27.67</td>
106
  </tr>
@@ -109,6 +116,7 @@ We report in the following table our internal pipeline benchmarks:
109
  <td>GSM8K (5-shot)</td>
110
  <td>26.68</td>
111
  <td>68.99</td>
 
112
  <td>25.7</td>
113
  <td>63.91</td>
114
  </tr>
@@ -116,6 +124,7 @@ We report in the following table our internal pipeline benchmarks:
116
  <td>MATH(4-shot)</td>
117
  <td>1.39</td>
118
  <td>8.43</td>
 
119
  <td>1.73</td>
120
  <td>9.38</td>
121
  </tr>
@@ -124,6 +133,7 @@ We report in the following table our internal pipeline benchmarks:
124
  <td>Arc Challenge (25-shot)</td>
125
  <td>50.76</td>
126
  <td>55.54</td>
 
127
  <td>50.34</td>
128
  <td>54.86</td>
129
  </tr>
@@ -131,6 +141,7 @@ We report in the following table our internal pipeline benchmarks:
131
  <td>GPQA (0-shot)</td>
132
  <td>27.49</td>
133
  <td>27.53</td>
 
134
  <td>38.6</td>
135
  <td>31.15</td>
136
  </tr>
@@ -138,6 +149,7 @@ We report in the following table our internal pipeline benchmarks:
138
  <td>MUSR (0-shot)</td>
139
  <td>35.24</td>
140
  <td>43.03</td>
 
141
  <td>42.13</td>
142
  <td>37.5</td>
143
  </tr>
@@ -145,6 +157,7 @@ We report in the following table our internal pipeline benchmarks:
145
  <td>BBH (3-shot)</td>
146
  <td>38.59</td>
147
  <td>46.12</td>
 
148
  <td>40.85</td>
149
  <td>44.23</td>
150
  </tr>
@@ -153,6 +166,7 @@ We report in the following table our internal pipeline benchmarks:
153
  <td>PIQA (0-shot)</td>
154
  <td>77.42</td>
155
  <td>78.89</td>
 
156
  <td>78.29</td>
157
  <td>75.62</td>
158
  </tr>
@@ -160,6 +174,7 @@ We report in the following table our internal pipeline benchmarks:
160
  <td>SciQ (0-shot)</td>
161
  <td>92.7</td>
162
  <td>95.6</td>
 
163
  <td>96.1</td>
164
  <td>93.1</td>
165
  </tr>
@@ -167,6 +182,7 @@ We report in the following table our internal pipeline benchmarks:
167
  <td>Winogrande (0-shot)</td>
168
  <td>69.69</td>
169
  <td>68.82</td>
 
170
  <td>68.35</td>
171
  <td>64.64</td>
172
  </tr>
@@ -174,6 +190,7 @@ We report in the following table our internal pipeline benchmarks:
174
  <td>OpenbookQA (0-shot)</td>
175
  <td>43.2</td>
176
  <td>42.2</td>
 
177
  <td>43</td>
178
  <td>39.4</td>
179
  </tr>
 
8
  - falcon3
9
  ---
10
 
11
+ # Falcon3-3B-Base
12
 
13
  **Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.
14
 
15
+ This repository contains the **Falcon3-3B-Base**. It achieves strong results on reasoning, language understanding, instruction following, code and mathematics tasks.<br>
16
+ `Falcon3-3B-Base` supports 4 languages (english, french, spanish, portuguese) and a context length up to 8K.<br>
17
+ `Falcon3-3B-Base` was pruned from `Falcon3-7B-Base`, then trained on only **100 GT** using a knowledge distillation objective.<br>
18
+ This base version is world's top 2 among under 5B pretrained LLMs at release, which makes it an excellent choice for finetuning and deployment on edge devices. <br>
19
+ `Falcon3-3B-Base` comes with a whole package of quantized versions for further efficiency and a SOTA [instruct version](https://huggingface.co/tiiuae/Falcon3-3B-Instruct) for direct use.
20
 
21
  ⚠️ **This is a raw, pretrained model, which should be further finetuned for most usecases.**
22
 
23
  ## Model Details
24
  - Architecture
25
+ - Transformer based causal decoder-only architecture
26
  - 22 decoder blocks
27
  - Grouped query attention (GQA) for faster inference: 12 query heads and 4 KV heads
28
  - Wider head dimension: 256
 
71
  <col style="width: 7%;">
72
  <col style="width: 7%;">
73
  <col style="width: 7%;">
74
+ <col style="width: 7%;">
75
  <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
76
  </colgroup>
77
  <thead>
 
80
  <th>Benchmark</th>
81
  <th>Llama3.2-3B</th>
82
  <th>Qwen2.5-3B</th>
83
+ <th>Phi2-2.5B</th>
84
  <th>Minitron-4B</th>
85
  <th>Falcon3-3B-Base</th>
86
  </tr>
 
91
  <td>MMLU (5-shot)</td>
92
  <td>56.1</td>
93
  <td>65.6</td>
94
+ <td> - </td>
95
  <td>58.6</td>
96
  <td>55.5</td>
97
  </tr>
 
99
  <td>MMLU-PRO (5-shot)</td>
100
  <td>24.9</td>
101
  <td>31.99</td>
102
+ <td> - </td>
103
  <td>26.21</td>
104
  <td>28.77</td>
105
  </tr>
 
107
  <td>IFEval</td>
108
  <td>12.83</td>
109
  <td>27</td>
110
+ <td> - </td>
111
  <td>22.81</td>
112
  <td>27.67</td>
113
  </tr>
 
116
  <td>GSM8K (5-shot)</td>
117
  <td>26.68</td>
118
  <td>68.99</td>
119
+ <td> - </td>
120
  <td>25.7</td>
121
  <td>63.91</td>
122
  </tr>
 
124
  <td>MATH(4-shot)</td>
125
  <td>1.39</td>
126
  <td>8.43</td>
127
+ <td> - </td>
128
  <td>1.73</td>
129
  <td>9.38</td>
130
  </tr>
 
133
  <td>Arc Challenge (25-shot)</td>
134
  <td>50.76</td>
135
  <td>55.54</td>
136
+ <td> - </td>
137
  <td>50.34</td>
138
  <td>54.86</td>
139
  </tr>
 
141
  <td>GPQA (0-shot)</td>
142
  <td>27.49</td>
143
  <td>27.53</td>
144
+ <td> - </td>
145
  <td>38.6</td>
146
  <td>31.15</td>
147
  </tr>
 
149
  <td>MUSR (0-shot)</td>
150
  <td>35.24</td>
151
  <td>43.03</td>
152
+ <td> - </td>
153
  <td>42.13</td>
154
  <td>37.5</td>
155
  </tr>
 
157
  <td>BBH (3-shot)</td>
158
  <td>38.59</td>
159
  <td>46.12</td>
160
+ <td> - </td>
161
  <td>40.85</td>
162
  <td>44.23</td>
163
  </tr>
 
166
  <td>PIQA (0-shot)</td>
167
  <td>77.42</td>
168
  <td>78.89</td>
169
+ <td> - </td>
170
  <td>78.29</td>
171
  <td>75.62</td>
172
  </tr>
 
174
  <td>SciQ (0-shot)</td>
175
  <td>92.7</td>
176
  <td>95.6</td>
177
+ <td> - </td>
178
  <td>96.1</td>
179
  <td>93.1</td>
180
  </tr>
 
182
  <td>Winogrande (0-shot)</td>
183
  <td>69.69</td>
184
  <td>68.82</td>
185
+ <td> - </td>
186
  <td>68.35</td>
187
  <td>64.64</td>
188
  </tr>
 
190
  <td>OpenbookQA (0-shot)</td>
191
  <td>43.2</td>
192
  <td>42.2</td>
193
+ <td> - </td>
194
  <td>43</td>
195
  <td>39.4</td>
196
  </tr>