Text Generation
Transformers
Safetensors
English
French
bitnet
mergekit
Merge
conversational
custom_code
jpacifico commited on
Commit
2fbeb74
·
verified ·
1 Parent(s): 5fa3cea

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -4
README.md CHANGED
@@ -24,7 +24,7 @@ Built with an iterative post-training recipe: bilingual DPO (FR+EN) + model merg
24
  Runs natively as BitNet 1.58-bit (ternary) and is available in GGUF 1.58-bit, lossless to the BF16 checkpoints.
25
 
26
  **Why BitNet (and why this model)**
27
- - BitNet b1.58 uses ternary weights (−1,0,+1) with abs-mean scaling : very low memory & energy, great CPU/edge throughput, unlike classic FP/INT SLMs.
28
  - ModelStock7 demonstrates that a 2B BitNet can deliver SOTA language understanding in its class without sacrificing efficiency.
29
 
30
  **Model Variants**
@@ -33,6 +33,7 @@ Runs natively as BitNet 1.58-bit (ternary) and is available in GGUF 1.58-bit, lo
33
  - [jpacifico/bitnet-dpo-fr-i2s-2](https://huggingface.co/jpacifico/bitnet-dpo-fr-i2s-2) : Quantized 1.58-bit GGUF version, you can use with [bitnet.cpp](https://github.com/microsoft/BitNet)
34
 
35
 
 
36
  # Training Recipe
37
 
38
  Base model : [microsoft/bitnet-b1.58-2B-4T-bf16](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-bf16)
@@ -45,11 +46,11 @@ Iterative DPO + Model merging :
45
  - Model merging (ModelStock and TIES methods, via [Mergekit](https://github.com/cg123/mergekit) to combine complementary strengths of bilingual models (FR-centric + EN-centric), improving robustness across reasoning and comprehension tasks while maintaining stability.
46
 
47
 
 
48
  # First benchmarks
49
 
50
  **Interpretation:** Significant gains on language understanding & pragmatic reasoning (ARC-C/E, Wino, BoolQ, HellaSwag, TriviaQA) with stability on other skills. Math/code are not the optimization target; GSM8K stays essentially stable relative to the BitNet 1.58-bit quantized baseline (58,38).
51
- All scores are reported in comparison with the original [microsoft/bitnet-b1.58-2B-4T-bf16](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-bf16) model.
52
- Evaluations were performed using [LM Eval Harness](https://github.com/EleutherAI/lm-evaluation-harness), all results are fully reproducible.
53
 
54
  | Benchmark (metric) | microsoft/bitnet-b1.58-2B-4T-bf16 | bitnet-dpo-merged-modelstock7 |
55
  |------------------------------------|-----------------------------------|--------------------------------|
@@ -84,7 +85,7 @@ Evaluations were performed using [LM Eval Harness](https://github.com/EleutherAI
84
  ### Reproducibility
85
 
86
  All benchmark results reported here were obtained using [LM Eval Harness](https://github.com/EleutherAI/lm-evaluation-harness).
87
- The following example reproduces the **ARC-Challenge (0-shot)** evaluation for this model:
88
 
89
  ```bash
90
  HF_ALLOW_CODE_EVAL=1 lm-eval --model hf \
@@ -144,6 +145,7 @@ tokenizer_source: jpacifico/bitnet-dpo-merged-modelstock-retrain
144
  ```
145
 
146
 
 
147
  # Limitations
148
 
149
  Not tuned for coding or formal math; prefer specialized variants if those are critical.
 
24
  Runs natively as BitNet 1.58-bit (ternary) and is available in GGUF 1.58-bit, lossless to the BF16 checkpoints.
25
 
26
  **Why BitNet (and why this model)**
27
+ - BitNet b1.58 uses ternary weights (−1,0,+1) with abs-mean scaling : very low memory & energy, great CPU/edge throughput, unlike classic FP/INT SLMs. For more details on the underlying architecture and efficiency of BitNet, please refer to the official Microsoft Research publication: [BitNet b1.58 2B4T Technical Report](https://arxiv.org/abs/2504.12285)
28
  - ModelStock7 demonstrates that a 2B BitNet can deliver SOTA language understanding in its class without sacrificing efficiency.
29
 
30
  **Model Variants**
 
33
  - [jpacifico/bitnet-dpo-fr-i2s-2](https://huggingface.co/jpacifico/bitnet-dpo-fr-i2s-2) : Quantized 1.58-bit GGUF version, you can use with [bitnet.cpp](https://github.com/microsoft/BitNet)
34
 
35
 
36
+
37
  # Training Recipe
38
 
39
  Base model : [microsoft/bitnet-b1.58-2B-4T-bf16](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-bf16)
 
46
  - Model merging (ModelStock and TIES methods, via [Mergekit](https://github.com/cg123/mergekit) to combine complementary strengths of bilingual models (FR-centric + EN-centric), improving robustness across reasoning and comprehension tasks while maintaining stability.
47
 
48
 
49
+
50
  # First benchmarks
51
 
52
  **Interpretation:** Significant gains on language understanding & pragmatic reasoning (ARC-C/E, Wino, BoolQ, HellaSwag, TriviaQA) with stability on other skills. Math/code are not the optimization target; GSM8K stays essentially stable relative to the BitNet 1.58-bit quantized baseline (58,38).
53
+ All scores are reported in comparison with the original [microsoft/bitnet-b1.58-2B-4T-bf16](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-bf16) model.
 
54
 
55
  | Benchmark (metric) | microsoft/bitnet-b1.58-2B-4T-bf16 | bitnet-dpo-merged-modelstock7 |
56
  |------------------------------------|-----------------------------------|--------------------------------|
 
85
  ### Reproducibility
86
 
87
  All benchmark results reported here were obtained using [LM Eval Harness](https://github.com/EleutherAI/lm-evaluation-harness).
88
+ The following example reproduces the **ARC-Challenge (0-shot)** evaluation for this model:
89
 
90
  ```bash
91
  HF_ALLOW_CODE_EVAL=1 lm-eval --model hf \
 
145
  ```
146
 
147
 
148
+
149
  # Limitations
150
 
151
  Not tuned for coding or formal math; prefer specialized variants if those are critical.