Fill-Mask
Transformers
PyTorch
modernbert

can we have some official training / finetuning recipes for this model ?

#11
by StephennFernandes - opened

hi on the latest version of transformers i tried to finetune mmBERT on the text classification tasks:
https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification

when i tired to use mmBERT as a drop in replacement over the original uncasedBERT, even after several epochs the accuracy is stuck to 0.3 and f1 score is always 0 .

seems like the mmBERT models are not directly compatible with all BERT finetuning techniques at the moment.

would really approeciate if we could get some training / finetuning guidelines and examples so we could use mmBERT in all possible ways we used mBERT or BERT before.

Center for Language and Speech Processing @ JHU org

The evaluations were done with (a slightly older version of) this script and others have already fine-tuned it with the example scripts, so it does work with the right environment. Perhaps it is an issue with the attention function, as I had flash attention installed? I know some ModernBERT models had issues with the backup attention function (sdpa attention) in the past though I thought it was resolved. Try something like pip install "flash_attn==2.6.3" --no-build-isolation or similar and see if it changes it.

@orionweller thanks for responding back. it really means a lot.

Wanted to know how could i continually pretrain the mmBERT model further on more custom pretraining data. are there any resources for this. how do you recommend is the most stable and performant way to further continually pretrain the mmBERT model ?

Sign up or log in to comment