mamba2-2.7b-hf

Correct conversion to Hugging Face format of the https://huggingface.co/AntonV/mamba2-2.7b-hf model. This fixes the error that occurs during saving the weights:

RuntimeError: The weights trying to be saved contained shared tensors [{'backbone.embeddings.weight', 'lm_head.weight'}] that are mismatching the transformers base configuration. Try saving using safe_serialization=False or remove this tensor sharing.

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("petkopetkov/mamba2-2.7b-hf")
model = AutoModelForCausalLM.from_pretrained("petkopetkov/mamba2-2.7b-hf")

input_ids = tokenizer("Hey how are you doing?", return_tensors="pt")["input_ids"]
out = model.generate(input_ids, max_new_tokens=10)
print(tokenizer.batch_decode(out))

This doesn't fix the Mamba2 cache error during evaluation in SFTTrainer:

TypeError: Unsupported types (<class 'transformers.models.mamba2.modeling_mamba2.Mamba2Cache'>) passed to _pad_across_processes. Only nested list/tuple/dicts of objects that are valid for is_torch_tensor should be passed.

A temporary fix is disabling the usage of the cache:

model.config.use_cache=False

Another limitation of the model is that it doesn't seem to work with bf16 16-bit (mixed) precision training, at least in the SFTTrainer, so it has to be disabled also:

training_args = SFTConfig(
  bf16=False
)

petkopetkov
/

mamba2-2.7b-hf

mamba2-2.7b-hf

Usage

Collection including petkopetkov/mamba2-2.7b-hf

Mamba2