Update README.md
Browse files
README.md
CHANGED
@@ -32,7 +32,7 @@ Runs natively as BitNet 1.58-bit (ternary) and is available in GGUF 1.58-bit, lo
|
|
32 |
- jpacifico/Aramis-2B-BitNet-bf16 (this repo): Contains the retrainable weights in BF16 format
|
33 |
- [jpacifico/Aramis-2B-BitNet-b1.58-i2s-GGUF](https://huggingface.co/jpacifico/Aramis-2B-BitNet-b1.58-i2s-GGUF) : Quantized 1.58-bit GGUF version, you can use with [bitnet.cpp](https://github.com/microsoft/BitNet)
|
34 |
|
35 |
-
|
36 |
|
37 |
# Training Recipe
|
38 |
|
@@ -45,7 +45,7 @@ Iterative DPO + Model merging :
|
|
45 |
[Intel/orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs)
|
46 |
- Model merging (ModelStock and TIES methods, via [Mergekit](https://github.com/cg123/mergekit) to combine complementary strengths of bilingual models (FR-centric + EN-centric), improving robustness across reasoning and comprehension tasks while maintaining stability.
|
47 |
|
48 |
-
|
49 |
|
50 |
# First benchmarks
|
51 |
|
@@ -102,15 +102,15 @@ HF_ALLOW_CODE_EVAL=1 lm-eval --model hf \
|
|
102 |
- Randomness (e.g. seeds, batch sizes) may cause slight variations in results
|
103 |
- The same procedure was used to evaluate all tasks presented in the benchmark tables
|
104 |
|
105 |
-
|
106 |
|
107 |
# Usage with `bitnet.cpp`
|
108 |
|
109 |
-
You can run this model using my demo [Colab notebook](https://github.com/jpacifico/)
|
110 |
|
111 |
Please refer to the [bitnet.cpp](https://github.com/microsoft/BitNet) GitHub repository for detailed compilation steps, usage examples, and command-line options.
|
112 |
|
113 |
-
|
114 |
|
115 |
# Last checkpoint
|
116 |
### Merge Method
|
@@ -144,7 +144,7 @@ tokenizer_source: jpacifico/bitnet-dpo-merged-modelstock-retrain
|
|
144 |
|
145 |
```
|
146 |
|
147 |
-
|
148 |
|
149 |
# Limitations
|
150 |
|
@@ -154,7 +154,7 @@ No explicit chain-of-thought training; improvements come from bilingual DPO + me
|
|
154 |
**Disclamer**
|
155 |
This model is intended for research and development purposes only and should not be used in commercial or real-world applications without further testing. While the Microsoft Research team has applied SFT and DPO to align the BitNet base model, it may still produce unexpected, biased, or inaccurate outputs. Please use responsibly.
|
156 |
|
157 |
-
|
158 |
|
159 |
- **Developed by:** Jonathan Pacifico, 2025
|
160 |
- **Model type:** LLM
|
|
|
32 |
- jpacifico/Aramis-2B-BitNet-bf16 (this repo): Contains the retrainable weights in BF16 format
|
33 |
- [jpacifico/Aramis-2B-BitNet-b1.58-i2s-GGUF](https://huggingface.co/jpacifico/Aramis-2B-BitNet-b1.58-i2s-GGUF) : Quantized 1.58-bit GGUF version, you can use with [bitnet.cpp](https://github.com/microsoft/BitNet)
|
34 |
|
35 |
+
---
|
36 |
|
37 |
# Training Recipe
|
38 |
|
|
|
45 |
[Intel/orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs)
|
46 |
- Model merging (ModelStock and TIES methods, via [Mergekit](https://github.com/cg123/mergekit) to combine complementary strengths of bilingual models (FR-centric + EN-centric), improving robustness across reasoning and comprehension tasks while maintaining stability.
|
47 |
|
48 |
+
---
|
49 |
|
50 |
# First benchmarks
|
51 |
|
|
|
102 |
- Randomness (e.g. seeds, batch sizes) may cause slight variations in results
|
103 |
- The same procedure was used to evaluate all tasks presented in the benchmark tables
|
104 |
|
105 |
+
---
|
106 |
|
107 |
# Usage with `bitnet.cpp`
|
108 |
|
109 |
+
You can run this model using my demo [Colab notebook](https://github.com/jpacifico/Aramis-BitNet/blob/main/Aramis_BitNet_inference_test.ipynb)
|
110 |
|
111 |
Please refer to the [bitnet.cpp](https://github.com/microsoft/BitNet) GitHub repository for detailed compilation steps, usage examples, and command-line options.
|
112 |
|
113 |
+
---
|
114 |
|
115 |
# Last checkpoint
|
116 |
### Merge Method
|
|
|
144 |
|
145 |
```
|
146 |
|
147 |
+
---
|
148 |
|
149 |
# Limitations
|
150 |
|
|
|
154 |
**Disclamer**
|
155 |
This model is intended for research and development purposes only and should not be used in commercial or real-world applications without further testing. While the Microsoft Research team has applied SFT and DPO to align the BitNet base model, it may still produce unexpected, biased, or inaccurate outputs. Please use responsibly.
|
156 |
|
157 |
+
---
|
158 |
|
159 |
- **Developed by:** Jonathan Pacifico, 2025
|
160 |
- **Model type:** LLM
|