Text Generation
Transformers
Safetensors
English
French
bitnet
mergekit
Merge
conversational
custom_code
jpacifico commited on
Commit
5420176
·
verified ·
1 Parent(s): d16c030

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -32,7 +32,7 @@ Runs natively as BitNet 1.58-bit (ternary) and is available in GGUF 1.58-bit, lo
32
  - jpacifico/Aramis-2B-BitNet-bf16 (this repo): Contains the retrainable weights in BF16 format
33
  - [jpacifico/Aramis-2B-BitNet-b1.58-i2s-GGUF](https://huggingface.co/jpacifico/Aramis-2B-BitNet-b1.58-i2s-GGUF) : Quantized 1.58-bit GGUF version, you can use with [bitnet.cpp](https://github.com/microsoft/BitNet)
34
 
35
-
36
 
37
  # Training Recipe
38
 
@@ -45,7 +45,7 @@ Iterative DPO + Model merging :
45
  [Intel/orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs)
46
  - Model merging (ModelStock and TIES methods, via [Mergekit](https://github.com/cg123/mergekit) to combine complementary strengths of bilingual models (FR-centric + EN-centric), improving robustness across reasoning and comprehension tasks while maintaining stability.
47
 
48
-
49
 
50
  # First benchmarks
51
 
@@ -102,15 +102,15 @@ HF_ALLOW_CODE_EVAL=1 lm-eval --model hf \
102
  - Randomness (e.g. seeds, batch sizes) may cause slight variations in results
103
  - The same procedure was used to evaluate all tasks presented in the benchmark tables
104
 
105
-
106
 
107
  # Usage with `bitnet.cpp`
108
 
109
- You can run this model using my demo [Colab notebook](https://github.com/jpacifico/) TBD
110
 
111
  Please refer to the [bitnet.cpp](https://github.com/microsoft/BitNet) GitHub repository for detailed compilation steps, usage examples, and command-line options.
112
 
113
-
114
 
115
  # Last checkpoint
116
  ### Merge Method
@@ -144,7 +144,7 @@ tokenizer_source: jpacifico/bitnet-dpo-merged-modelstock-retrain
144
 
145
  ```
146
 
147
-
148
 
149
  # Limitations
150
 
@@ -154,7 +154,7 @@ No explicit chain-of-thought training; improvements come from bilingual DPO + me
154
  **Disclamer**
155
  This model is intended for research and development purposes only and should not be used in commercial or real-world applications without further testing. While the Microsoft Research team has applied SFT and DPO to align the BitNet base model, it may still produce unexpected, biased, or inaccurate outputs. Please use responsibly.
156
 
157
-
158
 
159
  - **Developed by:** Jonathan Pacifico, 2025
160
  - **Model type:** LLM
 
32
  - jpacifico/Aramis-2B-BitNet-bf16 (this repo): Contains the retrainable weights in BF16 format
33
  - [jpacifico/Aramis-2B-BitNet-b1.58-i2s-GGUF](https://huggingface.co/jpacifico/Aramis-2B-BitNet-b1.58-i2s-GGUF) : Quantized 1.58-bit GGUF version, you can use with [bitnet.cpp](https://github.com/microsoft/BitNet)
34
 
35
+ ---
36
 
37
  # Training Recipe
38
 
 
45
  [Intel/orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs)
46
  - Model merging (ModelStock and TIES methods, via [Mergekit](https://github.com/cg123/mergekit) to combine complementary strengths of bilingual models (FR-centric + EN-centric), improving robustness across reasoning and comprehension tasks while maintaining stability.
47
 
48
+ ---
49
 
50
  # First benchmarks
51
 
 
102
  - Randomness (e.g. seeds, batch sizes) may cause slight variations in results
103
  - The same procedure was used to evaluate all tasks presented in the benchmark tables
104
 
105
+ ---
106
 
107
  # Usage with `bitnet.cpp`
108
 
109
+ You can run this model using my demo [Colab notebook](https://github.com/jpacifico/Aramis-BitNet/blob/main/Aramis_BitNet_inference_test.ipynb)
110
 
111
  Please refer to the [bitnet.cpp](https://github.com/microsoft/BitNet) GitHub repository for detailed compilation steps, usage examples, and command-line options.
112
 
113
+ ---
114
 
115
  # Last checkpoint
116
  ### Merge Method
 
144
 
145
  ```
146
 
147
+ ---
148
 
149
  # Limitations
150
 
 
154
  **Disclamer**
155
  This model is intended for research and development purposes only and should not be used in commercial or real-world applications without further testing. While the Microsoft Research team has applied SFT and DPO to align the BitNet base model, it may still produce unexpected, biased, or inaccurate outputs. Please use responsibly.
156
 
157
+ ---
158
 
159
  - **Developed by:** Jonathan Pacifico, 2025
160
  - **Model type:** LLM