Update README.md
Browse files
README.md
CHANGED
@@ -24,7 +24,7 @@ Built with an iterative post-training recipe: bilingual DPO (FR+EN) + model merg
|
|
24 |
Runs natively as BitNet 1.58-bit (ternary) and is available in GGUF 1.58-bit, lossless to the BF16 checkpoints.
|
25 |
|
26 |
**Why BitNet (and why this model)**
|
27 |
-
- BitNet b1.58 uses ternary weights (−1,0,+1) with abs-mean scaling : very low memory & energy, great CPU/edge throughput, unlike classic FP/INT SLMs.
|
28 |
- ModelStock7 demonstrates that a 2B BitNet can deliver SOTA language understanding in its class without sacrificing efficiency.
|
29 |
|
30 |
**Model Variants**
|
@@ -33,6 +33,7 @@ Runs natively as BitNet 1.58-bit (ternary) and is available in GGUF 1.58-bit, lo
|
|
33 |
- [jpacifico/bitnet-dpo-fr-i2s-2](https://huggingface.co/jpacifico/bitnet-dpo-fr-i2s-2) : Quantized 1.58-bit GGUF version, you can use with [bitnet.cpp](https://github.com/microsoft/BitNet)
|
34 |
|
35 |
|
|
|
36 |
# Training Recipe
|
37 |
|
38 |
Base model : [microsoft/bitnet-b1.58-2B-4T-bf16](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-bf16)
|
@@ -45,11 +46,11 @@ Iterative DPO + Model merging :
|
|
45 |
- Model merging (ModelStock and TIES methods, via [Mergekit](https://github.com/cg123/mergekit) to combine complementary strengths of bilingual models (FR-centric + EN-centric), improving robustness across reasoning and comprehension tasks while maintaining stability.
|
46 |
|
47 |
|
|
|
48 |
# First benchmarks
|
49 |
|
50 |
**Interpretation:** Significant gains on language understanding & pragmatic reasoning (ARC-C/E, Wino, BoolQ, HellaSwag, TriviaQA) with stability on other skills. Math/code are not the optimization target; GSM8K stays essentially stable relative to the BitNet 1.58-bit quantized baseline (58,38).
|
51 |
-
All scores are reported in comparison with the original [microsoft/bitnet-b1.58-2B-4T-bf16](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-bf16) model.
|
52 |
-
Evaluations were performed using [LM Eval Harness](https://github.com/EleutherAI/lm-evaluation-harness), all results are fully reproducible.
|
53 |
|
54 |
| Benchmark (metric) | microsoft/bitnet-b1.58-2B-4T-bf16 | bitnet-dpo-merged-modelstock7 |
|
55 |
|------------------------------------|-----------------------------------|--------------------------------|
|
@@ -84,7 +85,7 @@ Evaluations were performed using [LM Eval Harness](https://github.com/EleutherAI
|
|
84 |
### Reproducibility
|
85 |
|
86 |
All benchmark results reported here were obtained using [LM Eval Harness](https://github.com/EleutherAI/lm-evaluation-harness).
|
87 |
-
The following example reproduces the **ARC-Challenge (0-shot)** evaluation for this model:
|
88 |
|
89 |
```bash
|
90 |
HF_ALLOW_CODE_EVAL=1 lm-eval --model hf \
|
@@ -144,6 +145,7 @@ tokenizer_source: jpacifico/bitnet-dpo-merged-modelstock-retrain
|
|
144 |
```
|
145 |
|
146 |
|
|
|
147 |
# Limitations
|
148 |
|
149 |
Not tuned for coding or formal math; prefer specialized variants if those are critical.
|
|
|
24 |
Runs natively as BitNet 1.58-bit (ternary) and is available in GGUF 1.58-bit, lossless to the BF16 checkpoints.
|
25 |
|
26 |
**Why BitNet (and why this model)**
|
27 |
+
- BitNet b1.58 uses ternary weights (−1,0,+1) with abs-mean scaling : very low memory & energy, great CPU/edge throughput, unlike classic FP/INT SLMs. For more details on the underlying architecture and efficiency of BitNet, please refer to the official Microsoft Research publication: [BitNet b1.58 2B4T Technical Report](https://arxiv.org/abs/2504.12285)
|
28 |
- ModelStock7 demonstrates that a 2B BitNet can deliver SOTA language understanding in its class without sacrificing efficiency.
|
29 |
|
30 |
**Model Variants**
|
|
|
33 |
- [jpacifico/bitnet-dpo-fr-i2s-2](https://huggingface.co/jpacifico/bitnet-dpo-fr-i2s-2) : Quantized 1.58-bit GGUF version, you can use with [bitnet.cpp](https://github.com/microsoft/BitNet)
|
34 |
|
35 |
|
36 |
+
|
37 |
# Training Recipe
|
38 |
|
39 |
Base model : [microsoft/bitnet-b1.58-2B-4T-bf16](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-bf16)
|
|
|
46 |
- Model merging (ModelStock and TIES methods, via [Mergekit](https://github.com/cg123/mergekit) to combine complementary strengths of bilingual models (FR-centric + EN-centric), improving robustness across reasoning and comprehension tasks while maintaining stability.
|
47 |
|
48 |
|
49 |
+
|
50 |
# First benchmarks
|
51 |
|
52 |
**Interpretation:** Significant gains on language understanding & pragmatic reasoning (ARC-C/E, Wino, BoolQ, HellaSwag, TriviaQA) with stability on other skills. Math/code are not the optimization target; GSM8K stays essentially stable relative to the BitNet 1.58-bit quantized baseline (58,38).
|
53 |
+
All scores are reported in comparison with the original [microsoft/bitnet-b1.58-2B-4T-bf16](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-bf16) model.
|
|
|
54 |
|
55 |
| Benchmark (metric) | microsoft/bitnet-b1.58-2B-4T-bf16 | bitnet-dpo-merged-modelstock7 |
|
56 |
|------------------------------------|-----------------------------------|--------------------------------|
|
|
|
85 |
### Reproducibility
|
86 |
|
87 |
All benchmark results reported here were obtained using [LM Eval Harness](https://github.com/EleutherAI/lm-evaluation-harness).
|
88 |
+
The following example reproduces the **ARC-Challenge (0-shot)** evaluation for this model:
|
89 |
|
90 |
```bash
|
91 |
HF_ALLOW_CODE_EVAL=1 lm-eval --model hf \
|
|
|
145 |
```
|
146 |
|
147 |
|
148 |
+
|
149 |
# Limitations
|
150 |
|
151 |
Not tuned for coding or formal math; prefer specialized variants if those are critical.
|