Update pipeline tag, add library_name, and links to paper/code

This PR updates the model card to reflect the correct `pipeline_tag` (`image-text-to-text`) as the model processes both image and text inputs to generate a text response. It also adds `library_name: transformers` as the model is compatible with the Hugging Face Transformers library.

Additionally, explicit links to the paper and the GitHub repository are added for improved discoverability.

Files changed (1) hide show

README.md +6 -3

README.md CHANGED Viewed

@@ -1,18 +1,21 @@
 ---
-license: mit
 base_model: qwen2.5-vl
 tags:
 - vision-language-model
 - multimodal
 - reasoning
 - fine-tuned
 - qwen
-pipeline_tag: image-to-text
 ---
 # DRIFT
 This is a fine-tuned version of Qwen2.5-VL for enhanced reasoning capabilities, specifically optimized for multimodal reasoning tasks.
 ## Usage
@@ -78,4 +81,4 @@ If you use this model, please cite our paper.
 ## License
-This model is released under the MIT license.

 ---
 base_model: qwen2.5-vl
+license: mit
+pipeline_tag: image-text-to-text
 tags:
 - vision-language-model
 - multimodal
 - reasoning
 - fine-tuned
 - qwen
+library_name: transformers
 ---
 # DRIFT
 This is a fine-tuned version of Qwen2.5-VL for enhanced reasoning capabilities, specifically optimized for multimodal reasoning tasks.
+The model is presented in the paper [Directional Reasoning Injection for Fine-Tuning MLLMs](https://huggingface.co/papers/2510.15050).
+The code and further details can be found on the GitHub repository: https://github.com/WikiChao/DRIFT
 ## Usage
 ## License
+This model is released under the MIT license.