Update pipeline tag, add library_name, and links to paper/code
Browse filesThis PR updates the model card to reflect the correct `pipeline_tag` (`image-text-to-text`) as the model processes both image and text inputs to generate a text response. It also adds `library_name: transformers` as the model is compatible with the Hugging Face Transformers library.
Additionally, explicit links to the paper and the GitHub repository are added for improved discoverability.
README.md
CHANGED
|
@@ -1,18 +1,21 @@
|
|
| 1 |
---
|
| 2 |
-
license: mit
|
| 3 |
base_model: qwen2.5-vl
|
|
|
|
|
|
|
| 4 |
tags:
|
| 5 |
- vision-language-model
|
| 6 |
- multimodal
|
| 7 |
- reasoning
|
| 8 |
- fine-tuned
|
| 9 |
- qwen
|
| 10 |
-
|
| 11 |
---
|
| 12 |
|
| 13 |
# DRIFT
|
| 14 |
|
| 15 |
This is a fine-tuned version of Qwen2.5-VL for enhanced reasoning capabilities, specifically optimized for multimodal reasoning tasks.
|
|
|
|
|
|
|
| 16 |
|
| 17 |
## Usage
|
| 18 |
|
|
@@ -78,4 +81,4 @@ If you use this model, please cite our paper.
|
|
| 78 |
|
| 79 |
## License
|
| 80 |
|
| 81 |
-
This model is released under the MIT license.
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
base_model: qwen2.5-vl
|
| 3 |
+
license: mit
|
| 4 |
+
pipeline_tag: image-text-to-text
|
| 5 |
tags:
|
| 6 |
- vision-language-model
|
| 7 |
- multimodal
|
| 8 |
- reasoning
|
| 9 |
- fine-tuned
|
| 10 |
- qwen
|
| 11 |
+
library_name: transformers
|
| 12 |
---
|
| 13 |
|
| 14 |
# DRIFT
|
| 15 |
|
| 16 |
This is a fine-tuned version of Qwen2.5-VL for enhanced reasoning capabilities, specifically optimized for multimodal reasoning tasks.
|
| 17 |
+
The model is presented in the paper [Directional Reasoning Injection for Fine-Tuning MLLMs](https://huggingface.co/papers/2510.15050).
|
| 18 |
+
The code and further details can be found on the GitHub repository: https://github.com/WikiChao/DRIFT
|
| 19 |
|
| 20 |
## Usage
|
| 21 |
|
|
|
|
| 81 |
|
| 82 |
## License
|
| 83 |
|
| 84 |
+
This model is released under the MIT license.
|