Can we run inference without flash attention
Is there any way we can run inferencing on the model without having to install flash attention package., because i get the below error
ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.
Because it seems to run forever
On the model card it says to set attn_implementation='eager'
, but this did not work out for me...
I am not able to use the model for inference at all because of this issue.
Are you able to use it ?
- change "_attn_implementation": "eager" in config.json
- remove attn_implementation='flash_attention_2' in infer python code
then you don't have to use flash attention
there is a OOM issue if you use large image as input , because it will use "dynamic_hd": 36 in preprocessor_config and will send up to 36 patches to language model. modify it to smaller if you also get the issue.
I have tested it on my AMD rx7900xt in wsl2, but the VQA with Chinese seems not good.
Hi,
I am kind of a noob , please bear . I am running this through python IDE. Where can I access this config.json ?
Hi,
I am kind of a noob , please bear . I am running this through python IDE. Where can I access this config.json ?
you download the model using hf python script, you need to locate the model cache( it depends on your system, on linux, usually at ~/.cache/huggingface/hub/ ), then modify the model config in your local cache
I am getting the same issue as above, specifically with audio data.
Can you check if your environment has the following packages as suggested in the model card?
flash_attn==2.7.4.post1
torch==2.6.0
transformers==4.48.2
accelerate==1.3.0
soundfile==0.13.1
pillow==11.1.0
scipy==1.15.2
torchvision==0.21.0
backoff==2.2.1
peft==0.13.2
really bad hack under windows:
change line 2137 in "modeling_phi4mm.py" to: logits = self.lm_head(hidden_states[:, -0:, :])
the file can be found under: %USERPROFILE%\.cache\huggingface\modules\transformers_modules\microsoft\Phi-4-multimodal-instruct (in the subdirectory with long name and numbers in it)
P.S: unfortunatly you will have to do this again everytime a new snapshot is released (sort by "latest change date")
@junkstage I wonder if this dockerfile can be helpful https://huggingface.co/microsoft/Phi-4-multimodal-instruct/discussions/14#67c4f3d7f1dee1be83b06bae