README.md · tdh111/bitnet-b1.58-2B-4T-GGUF at main

metadata

license: mit
base_model:
  - microsoft/bitnet-b1.58-2B-4T
pipeline_tag: text-generation
base_model_relation: quantized
tags:
  - bitnet
  - ik_llama.cpp

The IQ2_BN and IQ2_BN_R4 version of microsoft/bitnet-b1.58-2B-4T-gguf for use with ik_llama.cpp.

I recommend the IQ2_BN_R4 version but you use -rtr on IQ2_BN to convert on runtime.

The chat template in the model looks incorrect (I did not change it, this is from the original Microsoft GGUF).

An example of correct usage from their transformers PR:

<|begin_of_text|>User: Hey, are you conscious? Can you talk to me?<|eot_id|>Assistant:

I was able to follow the example above and it worked for multi-turn conversations.