tiiuae
/

Falcon3-3B-Base

@@ -8,19 +8,21 @@ tags:
 - falcon3
 ---
-# Falcon3-7B-Base
 **Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.
-This repository contains the **Falcon3-3B-Base**. It achieves strong results on reasoning, language understanding, instruction following, code and mathematics tasks.
-Falcon3-3B-Base supports 4 languages (english, french, spanish, portuguese) and a context length up to 8K.
-Falcon3-3B-Base pruned (depth + width) from Falcon3-7B-Base, was effeciently trained on only 100 GT using a knowledge distillation objective.
 ⚠️ **This is a raw, pretrained model, which should be further finetuned for most usecases.**
 ## Model Details
 - Architecture
-  - Transformer based causal decoder only architecture
   - 22 decoder blocks
   - Grouped query attention (GQA) for faster inference: 12 query heads and 4 KV heads
   - Wider head dimension: 256
@@ -69,6 +71,7 @@ We report in the following table our internal pipeline benchmarks:
         <col style="width: 7%;">
         <col style="width: 7%;">
         <col style="width: 7%;">
         <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
     </colgroup>
     <thead>
@@ -77,6 +80,7 @@ We report in the following table our internal pipeline benchmarks:
             <th>Benchmark</th>
             <th>Llama3.2-3B</th>
             <th>Qwen2.5-3B</th>
             <th>Minitron-4B</th>
             <th>Falcon3-3B-Base</th>
         </tr>
@@ -87,6 +91,7 @@ We report in the following table our internal pipeline benchmarks:
             <td>MMLU (5-shot)</td>
             <td>56.1</td>
             <td>65.6</td>
             <td>58.6</td>
             <td>55.5</td>
         </tr>
@@ -94,6 +99,7 @@ We report in the following table our internal pipeline benchmarks:
             <td>MMLU-PRO (5-shot)</td>
             <td>24.9</td>
             <td>31.99</td>
             <td>26.21</td>
             <td>28.77</td>
         </tr>
@@ -101,6 +107,7 @@ We report in the following table our internal pipeline benchmarks:
             <td>IFEval</td>
             <td>12.83</td>
             <td>27</td>
             <td>22.81</td>
             <td>27.67</td>
         </tr>
@@ -109,6 +116,7 @@ We report in the following table our internal pipeline benchmarks:
             <td>GSM8K (5-shot)</td>
             <td>26.68</td>
             <td>68.99</td>
             <td>25.7</td>
             <td>63.91</td>
         </tr>
@@ -116,6 +124,7 @@ We report in the following table our internal pipeline benchmarks:
             <td>MATH(4-shot)</td>
             <td>1.39</td>
             <td>8.43</td>
             <td>1.73</td>
             <td>9.38</td>
         </tr>
@@ -124,6 +133,7 @@ We report in the following table our internal pipeline benchmarks:
             <td>Arc Challenge (25-shot)</td>
             <td>50.76</td>
             <td>55.54</td>
             <td>50.34</td>
             <td>54.86</td>
         </tr>
@@ -131,6 +141,7 @@ We report in the following table our internal pipeline benchmarks:
             <td>GPQA (0-shot)</td>
             <td>27.49</td>
             <td>27.53</td>
             <td>38.6</td>
             <td>31.15</td>
         </tr>
@@ -138,6 +149,7 @@ We report in the following table our internal pipeline benchmarks:
             <td>MUSR (0-shot)</td>
             <td>35.24</td>
             <td>43.03</td>
             <td>42.13</td>
             <td>37.5</td>
         </tr>
@@ -145,6 +157,7 @@ We report in the following table our internal pipeline benchmarks:
             <td>BBH (3-shot)</td>
             <td>38.59</td>
             <td>46.12</td>
             <td>40.85</td>
             <td>44.23</td>
         </tr>
@@ -153,6 +166,7 @@ We report in the following table our internal pipeline benchmarks:
             <td>PIQA (0-shot)</td>
             <td>77.42</td>
             <td>78.89</td>
             <td>78.29</td>
             <td>75.62</td>
         </tr>
@@ -160,6 +174,7 @@ We report in the following table our internal pipeline benchmarks:
             <td>SciQ (0-shot)</td>
             <td>92.7</td>
             <td>95.6</td>
             <td>96.1</td>
             <td>93.1</td>
         </tr>
@@ -167,6 +182,7 @@ We report in the following table our internal pipeline benchmarks:
             <td>Winogrande (0-shot)</td>
             <td>69.69</td>
             <td>68.82</td>
             <td>68.35</td>
             <td>64.64</td>
         </tr>
@@ -174,6 +190,7 @@ We report in the following table our internal pipeline benchmarks:
             <td>OpenbookQA (0-shot)</td>
             <td>43.2</td>
             <td>42.2</td>
             <td>43</td>
             <td>39.4</td>
         </tr>

 - falcon3
 ---
+# Falcon3-3B-Base
 **Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.
+This repository contains the **Falcon3-3B-Base**. It achieves strong results on reasoning, language understanding, instruction following, code and mathematics tasks.<br>
+`Falcon3-3B-Base` supports 4 languages (english, french, spanish, portuguese) and a context length up to 8K.<br>
+`Falcon3-3B-Base` was pruned from `Falcon3-7B-Base`, then trained on only **100 GT** using a knowledge distillation objective.<br>
+This base version is world's top 2 among under 5B pretrained LLMs at release, which makes it an excellent choice for finetuning and deployment on edge devices. <br>
+`Falcon3-3B-Base` comes with a whole package of quantized versions for further efficiency and a SOTA [instruct version](https://huggingface.co/tiiuae/Falcon3-3B-Instruct) for direct use.
 ⚠️ **This is a raw, pretrained model, which should be further finetuned for most usecases.**
 ## Model Details
 - Architecture
+  - Transformer based causal decoder-only architecture
   - 22 decoder blocks
   - Grouped query attention (GQA) for faster inference: 12 query heads and 4 KV heads
   - Wider head dimension: 256
         <col style="width: 7%;">
         <col style="width: 7%;">
         <col style="width: 7%;">
+        <col style="width: 7%;">
         <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
     </colgroup>
     <thead>
             <th>Benchmark</th>
             <th>Llama3.2-3B</th>
             <th>Qwen2.5-3B</th>
+            <th>Phi2-2.5B</th>
             <th>Minitron-4B</th>
             <th>Falcon3-3B-Base</th>
         </tr>
             <td>MMLU (5-shot)</td>
             <td>56.1</td>
             <td>65.6</td>
+            <td> - </td>
             <td>58.6</td>
             <td>55.5</td>
         </tr>
             <td>MMLU-PRO (5-shot)</td>
             <td>24.9</td>
             <td>31.99</td>
+            <td> - </td>
             <td>26.21</td>
             <td>28.77</td>
         </tr>
             <td>IFEval</td>
             <td>12.83</td>
             <td>27</td>
+            <td> - </td>
             <td>22.81</td>
             <td>27.67</td>
         </tr>
             <td>GSM8K (5-shot)</td>
             <td>26.68</td>
             <td>68.99</td>
+            <td> - </td>
             <td>25.7</td>
             <td>63.91</td>
         </tr>
             <td>MATH(4-shot)</td>
             <td>1.39</td>
             <td>8.43</td>
+            <td> - </td>
             <td>1.73</td>
             <td>9.38</td>
         </tr>
             <td>Arc Challenge (25-shot)</td>
             <td>50.76</td>
             <td>55.54</td>
+            <td> - </td>
             <td>50.34</td>
             <td>54.86</td>
         </tr>
             <td>GPQA (0-shot)</td>
             <td>27.49</td>
             <td>27.53</td>
+            <td> - </td>
             <td>38.6</td>
             <td>31.15</td>
         </tr>
             <td>MUSR (0-shot)</td>
             <td>35.24</td>
             <td>43.03</td>
+            <td> - </td>
             <td>42.13</td>
             <td>37.5</td>
         </tr>
             <td>BBH (3-shot)</td>
             <td>38.59</td>
             <td>46.12</td>
+            <td> - </td>
             <td>40.85</td>
             <td>44.23</td>
         </tr>
             <td>PIQA (0-shot)</td>
             <td>77.42</td>
             <td>78.89</td>
+            <td> - </td>
             <td>78.29</td>
             <td>75.62</td>
         </tr>
             <td>SciQ (0-shot)</td>
             <td>92.7</td>
             <td>95.6</td>
+            <td> - </td>
             <td>96.1</td>
             <td>93.1</td>
         </tr>
             <td>Winogrande (0-shot)</td>
             <td>69.69</td>
             <td>68.82</td>
+            <td> - </td>
             <td>68.35</td>
             <td>64.64</td>
         </tr>
             <td>OpenbookQA (0-shot)</td>
             <td>43.2</td>
             <td>42.2</td>
+            <td> - </td>
             <td>43</td>
             <td>39.4</td>
         </tr>