Can we run inference without flash attention

by VitoVikram - opened Feb 28

Feb 28

•

Is there any way we can run inferencing on the model without having to install flash attention package., because i get the below error

ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.

Because it seems to run forever

sdascoli

Feb 28

On the model card it says to set attn_implementation='eager', but this did not work out for me...

VitoVikram

Feb 28

I am not able to use the model for inference at all because of this issue.
Are you able to use it ?

mikv39

Feb 28

change "_attn_implementation": "eager" in config.json
remove attn_implementation='flash_attention_2' in infer python code

then you don't have to use flash attention

there is a OOM issue if you use large image as input , because it will use "dynamic_hd": 36 in preprocessor_config and will send up to 36 patches to language model. modify it to smaller if you also get the issue.

I have tested it on my AMD rx7900xt in wsl2, but the VQA with Chinese seems not good.

VitoVikram

Mar 1

Hi,
I am kind of a noob , please bear . I am running this through python IDE. Where can I access this config.json ?

mikv39

Mar 1

Hi,
I am kind of a noob , please bear . I am running this through python IDE. Where can I access this config.json ?

you download the model using hf python script, you need to locate the model cache( it depends on your system, on linux, usually at ~/.cache/huggingface/hub/ ), then modify the model config in your local cache

VitoVikram

Mar 1

Hi thanks for the suggestion , I was able to move towards inference but now I get the below error,
Is it something to do with the way I am passing the images ?

Code:

Error:

jmurzaku

Mar 1

I am getting the same issue as above, specifically with audio data.

nguyenbh

Microsoft org Mar 1

Can you check if your environment has the following packages as suggested in the model card?

flash_attn==2.7.4.post1
torch==2.6.0
transformers==4.48.2
accelerate==1.3.0
soundfile==0.13.1
pillow==11.1.0
scipy==1.15.2
torchvision==0.21.0
backoff==2.2.1
peft==0.13.2

junkstage

Mar 2

•

edited Mar 2

really bad hack under windows:
change line 2137 in "modeling_phi4mm.py" to: logits = self.lm_head(hidden_states[:, -0:, :])
the file can be found under: %USERPROFILE%\.cache\huggingface\modules\transformers_modules\microsoft\Phi-4-multimodal-instruct (in the subdirectory with long name and numbers in it)

P.S: unfortunatly you will have to do this again everytime a new snapshot is released (sort by "latest change date")

nguyenbh

Microsoft org Mar 3

@junkstage I wonder if this dockerfile can be helpful https://huggingface.co/microsoft/Phi-4-multimodal-instruct/discussions/14#67c4f3d7f1dee1be83b06bae

VitoVikram

Mar 3

Thank you @junkstage . You have helped me with the solution.. thanks !!!

VitoVikram changed discussion status to closed Mar 3

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment