EuroBERT/EuroBERT-210m · Error when passing attention_mask, and using flash_attention

Mar 24

•

The model's forward method raises an error (RuntimeError: cu_seqlens_q must have shape (batch_size + 1)) when using both flash attention and attention mask as input.
The problem is in the forward method of ModernBertModel class (rows 526 to 529), where it does
if attention_mask is not None:
mask = self.mask_converter.to_4d(attention_mask, attention_mask.shape[1], inputs_embeds.dtype)
else:
mask = None
This should not be done when using flash attention, since the transformers' flash_attention_forward function expects a 2d attention mask as input. By removing that part of code, it works.

hgissbkh

EuroBERT org Mar 25

Hi @giuseppe-trimigno ,
We're currently working on fixing this issue, sorry for that.
Cheers

Nicolas-BZRD

EuroBERT org Mar 25

Hey @giuseppe-trimigno , I just fixed the flash. Feel free to give it a try. You might need to clear your Hugging Face cache to redownload the model.

Nicolas-BZRD changed discussion status to closed Mar 25

EuroBERT
/

EuroBERT-210m

Error when passing attention_mask, and using flash_attention_2