How to bypass flash attention 2 requirement for Apple Silicon?
#63
by
MC-QQ
- opened
I got one M4 Mac mini and try to run this model.
Got the following error
Library/Python/3.9/lib/python/site-packages/transformers/modeling_utils.py", line 1659, in _check_and_enable_flash_attn_2
raise ImportError(f"{preface} the package flash_attn seems to be not installed. {install_message}")
ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.
Tried the approach in https://huggingface.co/qnguyen3/nanoLLaVA-1.5/discussions/4 but didn't work.
Any suggestions? Thanks a ton!
Hopefully it's available in mlx-vlm
at some point...