Flashattention 2 support?

#14

by t-albertge - opened Jul 12

Jul 12

Hi there,

would it be possible to have FlashAttention-2 support to the model? I think the modeling code already uses torch's spda kernel in the forward call of LLaDABlock, but is it possible to have flashattention-2? Thanks!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment