docs: Readme Updated for optimized Usage with transformers library
python code for transformers usage updated to use flash-attn as attention implementation to boost the performance and reduce memory usage.
@sayed99 Great work on this, and thank you for your contribution!
To ensure a smooth out-of-the-box experience for all users, we think it’s better to make flash-attn optional instead of default. To save you some time, I’ll go ahead and push a small commit to this PR to make that change.
Also, I suggest removing these two steps. They don’t seem necessary for a code example and removing them would simplify it.
from google.colab import files
...
# 2- Upload image (drag & drop any PNG/JPG)
...
# 3. Resize max-2048 preserving aspect ratio
...
Thanks again for the excellent work!
@xiaohei66
Thank you for the suggestions and for helping improve the PR! I agree that making flash-attn optional and removing the extra Colab steps will simplify the example and make it more user-friendly. I appreciate your help in pushing the small commit, looking forward to reviewing the changes.
@xiaohei66
Hello, Thanks for your efforts,
I wonder if that merge would be merged to the main model card automatically soon?