Train Converted XLM-RoBERTa model without FlashAttention installed
Hi
I successfully converted an XLM-RoBERTa model using the convert_roberta_weights_to_flash.py
script. However, when I try to train it using Hugging Face Trainer, I get the following error:
RuntimeError: FlashAttention is not installed. To proceed with training, please install FlashAttention. For inference, you have two options: either install FlashAttention or disable it by setting use_flash_attn=False when loading the model.
My GPU does not support FlashAttention, so I want to train the model without installing it. I have already tried setting use_flash_attn=False
in the config and during model loading, but the error persists.
Interestingly, when I fine-tune the jinaai/jina-embeddings-v3
model (which, according to the documentation, uses this same converted implementation), it works perfectly and can be trained without FlashAttention installed.
Question:
How can I train a converted XLM-RoBERTa model without FlashAttention installed, similar to how jinaai/jina-embeddings-v3
works? Is there a workaround or patch for this issue, or do I need to re-convert or modify something in the code to get pure PyTorch rotary embeddings for training?
Thank you for your help!