jpacifico
/

Aramis-2B-BitNet-bf16

@@ -24,7 +24,7 @@ Built with an iterative post-training recipe: bilingual DPO (FR+EN) + model merg
 Runs natively as BitNet 1.58-bit (ternary) and is available in GGUF 1.58-bit, lossless to the BF16 checkpoints.
 **Why BitNet (and why this model)**
-- BitNet b1.58 uses ternary weights (−1,0,+1) with abs-mean scaling : very low memory & energy, great CPU/edge throughput, unlike classic FP/INT SLMs.
 - ModelStock7 demonstrates that a 2B BitNet can deliver SOTA language understanding in its class without sacrificing efficiency.
 **Model Variants**
@@ -33,6 +33,7 @@ Runs natively as BitNet 1.58-bit (ternary) and is available in GGUF 1.58-bit, lo
 - [jpacifico/bitnet-dpo-fr-i2s-2](https://huggingface.co/jpacifico/bitnet-dpo-fr-i2s-2) : Quantized 1.58-bit GGUF version, you can use with [bitnet.cpp](https://github.com/microsoft/BitNet)
 # Training Recipe
 Base model : [microsoft/bitnet-b1.58-2B-4T-bf16](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-bf16)
@@ -45,11 +46,11 @@ Iterative DPO + Model merging :
 - Model merging (ModelStock and TIES methods, via [Mergekit](https://github.com/cg123/mergekit) to combine complementary strengths of bilingual models (FR-centric + EN-centric), improving robustness across reasoning and comprehension tasks while maintaining stability.
 # First benchmarks
 **Interpretation:** Significant gains on language understanding & pragmatic reasoning (ARC-C/E, Wino, BoolQ, HellaSwag, TriviaQA) with stability on other skills. Math/code are not the optimization target; GSM8K stays essentially stable relative to the BitNet 1.58-bit quantized baseline (58,38).
-All scores are reported in comparison with the original [microsoft/bitnet-b1.58-2B-4T-bf16](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-bf16) model.
-Evaluations were performed using [LM Eval Harness](https://github.com/EleutherAI/lm-evaluation-harness), all results are fully reproducible.
 | Benchmark (metric)                 | microsoft/bitnet-b1.58-2B-4T-bf16 | bitnet-dpo-merged-modelstock7 |
 |------------------------------------|-----------------------------------|--------------------------------|
@@ -84,7 +85,7 @@ Evaluations were performed using [LM Eval Harness](https://github.com/EleutherAI
 ### Reproducibility
 All benchmark results reported here were obtained using [LM Eval Harness](https://github.com/EleutherAI/lm-evaluation-harness).
-The following example reproduces the **ARC-Challenge (0-shot)** evaluation for this model:
 ```bash
 HF_ALLOW_CODE_EVAL=1 lm-eval --model hf \
@@ -144,6 +145,7 @@ tokenizer_source: jpacifico/bitnet-dpo-merged-modelstock-retrain
 ```
 # Limitations
 Not tuned for coding or formal math; prefer specialized variants if those are critical.

 Runs natively as BitNet 1.58-bit (ternary) and is available in GGUF 1.58-bit, lossless to the BF16 checkpoints.
 **Why BitNet (and why this model)**
+- BitNet b1.58 uses ternary weights (−1,0,+1) with abs-mean scaling : very low memory & energy, great CPU/edge throughput, unlike classic FP/INT SLMs. For more details on the underlying architecture and efficiency of BitNet, please refer to the official Microsoft Research publication: [BitNet b1.58 2B4T Technical Report](https://arxiv.org/abs/2504.12285)
 - ModelStock7 demonstrates that a 2B BitNet can deliver SOTA language understanding in its class without sacrificing efficiency.
 **Model Variants**
 - [jpacifico/bitnet-dpo-fr-i2s-2](https://huggingface.co/jpacifico/bitnet-dpo-fr-i2s-2) : Quantized 1.58-bit GGUF version, you can use with [bitnet.cpp](https://github.com/microsoft/BitNet)
 # Training Recipe
 Base model : [microsoft/bitnet-b1.58-2B-4T-bf16](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-bf16)
 - Model merging (ModelStock and TIES methods, via [Mergekit](https://github.com/cg123/mergekit) to combine complementary strengths of bilingual models (FR-centric + EN-centric), improving robustness across reasoning and comprehension tasks while maintaining stability.
 # First benchmarks
 **Interpretation:** Significant gains on language understanding & pragmatic reasoning (ARC-C/E, Wino, BoolQ, HellaSwag, TriviaQA) with stability on other skills. Math/code are not the optimization target; GSM8K stays essentially stable relative to the BitNet 1.58-bit quantized baseline (58,38).
+All scores are reported in comparison with the original [microsoft/bitnet-b1.58-2B-4T-bf16](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-bf16) model.
 | Benchmark (metric)                 | microsoft/bitnet-b1.58-2B-4T-bf16 | bitnet-dpo-merged-modelstock7 |
 |------------------------------------|-----------------------------------|--------------------------------|
 ### Reproducibility
 All benchmark results reported here were obtained using [LM Eval Harness](https://github.com/EleutherAI/lm-evaluation-harness).
+The following example reproduces the **ARC-Challenge (0-shot)** evaluation for this model:
 ```bash
 HF_ALLOW_CODE_EVAL=1 lm-eval --model hf \
 ```
 # Limitations
 Not tuned for coding or formal math; prefer specialized variants if those are critical.