Add paper abstract to model card
Browse filesThis PR adds the paper abstract to the model card for better documentation.
README.md
CHANGED
@@ -1,15 +1,15 @@
|
|
1 |
---
|
2 |
-
license: mit
|
3 |
-
license_link: https://huggingface.co/microsoft/bitnet-b1.58-2B-4T/blob/main/LICENSE
|
4 |
language:
|
5 |
- en
|
|
|
|
|
|
|
6 |
pipeline_tag: text-generation
|
7 |
tags:
|
8 |
- chat
|
9 |
- bitnet
|
10 |
- text-generation
|
11 |
- large-language-model
|
12 |
-
library_name: transformers
|
13 |
---
|
14 |
|
15 |
# BitNet b1.58 2B4T - Scaling Native 1-bit LLM
|
@@ -22,6 +22,10 @@ Trained on a corpus of 4 trillion tokens, this model demonstrates that native 1-
|
|
22 |
|
23 |
➡️ **Official Inference Code:** [microsoft/BitNet (bitnet.cpp)](https://github.com/microsoft/BitNet)
|
24 |
|
|
|
|
|
|
|
|
|
25 |
## Model Variants
|
26 |
|
27 |
Several versions of the model weights are available on Hugging Face:
|
@@ -98,7 +102,8 @@ chat_input = tokenizer(prompt, return_tensors="pt").to(model.device)
|
|
98 |
# Generate response
|
99 |
chat_outputs = model.generate(**chat_input, max_new_tokens=50)
|
100 |
response = tokenizer.decode(chat_outputs[0][chat_input['input_ids'].shape[-1]:], skip_special_tokens=True) # Decode only the response part
|
101 |
-
print("
|
|
|
102 |
```
|
103 |
|
104 |
## How to Use (with `bitnet.cpp`)
|
@@ -141,4 +146,4 @@ BitNet b1.58 2B4T was evaluated against leading open-weight full-precision LLMs
|
|
141 |
The model weights and code are released under the [MIT License](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T/blob/main/LICENSE).
|
142 |
|
143 |
## Disclaimer
|
144 |
-
This model is intended for research and development purposes. While efforts have been made to align it using SFT and DPO, it may still produce outputs that are unexpected, biased, or inaccurate. Please use responsibly.
|
|
|
1 |
---
|
|
|
|
|
2 |
language:
|
3 |
- en
|
4 |
+
library_name: transformers
|
5 |
+
license: mit
|
6 |
+
license_link: https://huggingface.co/microsoft/bitnet-b1.58-2B-4T/blob/main/LICENSE
|
7 |
pipeline_tag: text-generation
|
8 |
tags:
|
9 |
- chat
|
10 |
- bitnet
|
11 |
- text-generation
|
12 |
- large-language-model
|
|
|
13 |
---
|
14 |
|
15 |
# BitNet b1.58 2B4T - Scaling Native 1-bit LLM
|
|
|
22 |
|
23 |
➡️ **Official Inference Code:** [microsoft/BitNet (bitnet.cpp)](https://github.com/microsoft/BitNet)
|
24 |
|
25 |
+
# Paper abstract
|
26 |
+
|
27 |
+
The abstract of the paper is the following:
|
28 |
+
|
29 |
## Model Variants
|
30 |
|
31 |
Several versions of the model weights are available on Hugging Face:
|
|
|
102 |
# Generate response
|
103 |
chat_outputs = model.generate(**chat_input, max_new_tokens=50)
|
104 |
response = tokenizer.decode(chat_outputs[0][chat_input['input_ids'].shape[-1]:], skip_special_tokens=True) # Decode only the response part
|
105 |
+
print("
|
106 |
+
Assistant Response:", response)
|
107 |
```
|
108 |
|
109 |
## How to Use (with `bitnet.cpp`)
|
|
|
146 |
The model weights and code are released under the [MIT License](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T/blob/main/LICENSE).
|
147 |
|
148 |
## Disclaimer
|
149 |
+
This model is intended for research and development purposes. While efforts have been made to align it using SFT and DPO, it may still produce outputs that are unexpected, biased, or inaccurate. Please use responsibly.
|