Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
shumingma 
posted an update Mar 20
Post
2595
The Era of 1-bit LLMs: Training Tips, Code and FAQ

https://github.com/microsoft/unilm/blob/master/bitnet/The-Era-of-1-bit-LLMs__Training_Tips_Code_FAQ.pdf

We present details and tips for training 1-bit LLMs. We also provide additional experiments and results that were not reported and responses to questions regarding the "The-Era-of-1-bit-LLM" paper. Finally, we include the official PyTorch implementation of BitNet (b1.58 and b1) for future research and development of 1-bit LLMs.

The RMSNorm(x) that you use in BitLinear(), does it have learnable parameters?
The original paper referenced Lei Jimmy Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. Layer normalization. CoRR, 2016.
Which should be used?

·
This comment has been hidden
In this post