mamba2-2.7b-hf
Correct conversion to Hugging Face format of the https://huggingface.co/AntonV/mamba2-2.7b-hf model. This fixes the error that occurs during saving the weights:
RuntimeError: The weights trying to be saved contained shared tensors [{'backbone.embeddings.weight', 'lm_head.weight'}] that are mismatching the transformers base configuration. Try saving using safe_serialization=False
or remove this tensor sharing.
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("petkopetkov/mamba2-2.7b-hf")
model = AutoModelForCausalLM.from_pretrained("petkopetkov/mamba2-2.7b-hf")
input_ids = tokenizer("Hey how are you doing?", return_tensors="pt")["input_ids"]
out = model.generate(input_ids, max_new_tokens=10)
print(tokenizer.batch_decode(out))
This doesn't fix the Mamba2 cache error during evaluation in SFTTrainer:
TypeError: Unsupported types (<class 'transformers.models.mamba2.modeling_mamba2.Mamba2Cache'>) passed to _pad_across_processes
. Only nested list/tuple/dicts of objects that are valid for is_torch_tensor
should be passed.
A temporary fix is disabling the usage of the cache:
model.config.use_cache=False
Another limitation of the model is that it doesn't seem to work with bf16 16-bit (mixed) precision training, at least in the SFTTrainer, so it has to be disabled also:
training_args = SFTConfig(
bf16=False
)
- Downloads last month
- 8