Can we run inference without flash attention

#9
by VitoVikram - opened

Is there any way we can run inferencing on the model without having to install flash attention package., because i get the below error

ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.

Because it seems to run forever

image.png

On the model card it says to set attn_implementation='eager', but this did not work out for me...

I am not able to use the model for inference at all because of this issue.
Are you able to use it ?

  1. change "_attn_implementation": "eager" in config.json
  2. remove attn_implementation='flash_attention_2' in infer python code

then you don't have to use flash attention

there is a OOM issue if you use large image as input , because it will use "dynamic_hd": 36 in preprocessor_config and will send up to 36 patches to language model. modify it to smaller if you also get the issue.

I have tested it on my AMD rx7900xt in wsl2, but the VQA with Chinese seems not good.

Hi,
I am kind of a noob , please bear . I am running this through python IDE. Where can I access this config.json ?

Hi,
I am kind of a noob , please bear . I am running this through python IDE. Where can I access this config.json ?

you download the model using hf python script, you need to locate the model cache( it depends on your system, on linux, usually at ~/.cache/huggingface/hub/ ), then modify the model config in your local cache

Hi thanks for the suggestion , I was able to move towards inference but now I get the below error,
Is it something to do with the way I am passing the images ?

Code:
image.png
Error:

image.png

I am getting the same issue as above, specifically with audio data.

Microsoft org

Can you check if your environment has the following packages as suggested in the model card?

flash_attn==2.7.4.post1
torch==2.6.0
transformers==4.48.2
accelerate==1.3.0
soundfile==0.13.1
pillow==11.1.0
scipy==1.15.2
torchvision==0.21.0
backoff==2.2.1
peft==0.13.2

really bad hack under windows:
change line 2137 in "modeling_phi4mm.py" to: logits = self.lm_head(hidden_states[:, -0:, :])
the file can be found under: %USERPROFILE%\.cache\huggingface\modules\transformers_modules\microsoft\Phi-4-multimodal-instruct (in the subdirectory with long name and numbers in it)

P.S: unfortunatly you will have to do this again everytime a new snapshot is released (sort by "latest change date")

Thank you @junkstage . You have helped me with the solution.. thanks !!!

VitoVikram changed discussion status to closed
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment