Text-to-Speech
Transformers
Safetensors
parler_tts
text-generation
annotation

Flash attention 2 error, could you kindly provide solution

#13
by k1-m - opened
model_name = "ai4bharat/indic-parler-tts"
model = ParlerTTSForConditionalGeneration.from_pretrained(
    model_name,
    attn_implementation="flash_attention_2"  # <-- Enable Flash Attention 2
)

gave errors:

\default\Lib\site-packages\transformers\modeling_utils.py", line 1617, in _autoset_attn_implementation
cls._check_and_enable_flash_attn_2(
File "<..>\Lib\site-packages\transformers\modeling_utils.py", line 1736, in _check_and_enable_flash_attn_2
raise ValueError(
ValueError: T5EncoderModel does not support Flash Attention 2.0 yet.

windows 11 with torch2.6.0, transformers and other dependencies already installed with cuda 12.6 present, tested GPU to work with transformers (tested a different script)
Unfortunately without flash attention 2 (which is expected to bring orders of magnitude speedup), it is taking quite long on Rtx3080 GPU for even 3 lines of telugu text,

Need to use flash attention 2 to get the maximum speedup to be able to process text file containing 100s of lines, so resolving this flash attention 2 error would be very helpful
Thank you in advance

AI4Bharat org

T5 models do not support flash attention, so you need to specify only the decoder to use flash-attention.
Eg.

attn_implementation={"decoder": model_args.attn_implementation, "text_encoder": "eager"}
AshwinSankar changed discussion status to closed

Then can the below misleading entry be removed from indic-parler-tts page[snippet pasted below] so that it helps others while choosing this model and planning to use performance tips/guidelines:
"Tips::
We've set up an Guide inference( https://github.com/huggingface/parler-tts/blob/main/INFERENCE.md ) to make generation faster. Think SDPA, Torch.Comple, Batching and Streaming!"

Notes on above tips:

  1. SDPA as suggested does not work (from above reply , it is unsupported)
  2. attention implementation flash attention also not supported

As Speed/Time means a lot , accuracy in documentation about performance will help many from model selection perspective, and from time impact perspective as well

Sign up or log in to comment