How to bypass flash attention 2 requirement for Apple Silicon?

#63
by MC-QQ - opened

I got one M4 Mac mini and try to run this model.

Got the following error

Library/Python/3.9/lib/python/site-packages/transformers/modeling_utils.py", line 1659, in _check_and_enable_flash_attn_2
raise ImportError(f"{preface} the package flash_attn seems to be not installed. {install_message}")

ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.

Tried the approach in https://huggingface.co/qnguyen3/nanoLLaVA-1.5/discussions/4 but didn't work.
Any suggestions? Thanks a ton!

Hopefully it's available in mlx-vlm at some point...

Sign up or log in to comment