Error when passing attention_mask, and using flash_attention_2

#11
by giuseppe-trimigno - opened

The model's forward method raises an error (RuntimeError: cu_seqlens_q must have shape (batch_size + 1)) when using both flash attention and attention mask as input.
The problem is in the forward method of ModernBertModel class (rows 526 to 529), where it does
if attention_mask is not None:
mask = self.mask_converter.to_4d(attention_mask, attention_mask.shape[1], inputs_embeds.dtype)
else:
mask = None
This should not be done when using flash attention, since the transformers' flash_attention_forward function expects a 2d attention mask as input. By removing that part of code, it works.

EuroBERT org

Hi @giuseppe-trimigno ,
We're currently working on fixing this issue, sorry for that.
Cheers

EuroBERT org

Hey @giuseppe-trimigno , I just fixed the flash. Feel free to give it a try. You might need to clear your Hugging Face cache to redownload the model.

Nicolas-BZRD changed discussion status to closed

Sign up or log in to comment