Improve model card for Vision-SR1-7B with metadata and links
#1
by
nielsr
HF Staff
- opened
This PR significantly improves the model card for the Osilly/Vision-R1-7B
model, which corresponds to the Vision-SR1-7B model introduced in the paper "Self-Rewarding Vision-Language Model via Reasoning Decomposition".
The changes include:
- Adding
library_name: transformers
to the metadata. This enables the automated "Use in Transformers" code snippet, leveraging the detectedQwen2_5_VLForConditionalGeneration
architecture andtransformers
dependency mentioned in theconfig.json
and GitHub README. - Adding
pipeline_tag: image-text-to-text
to the metadata, which ensures the model is discoverable under the appropriate task on the Hugging Face Hub (e.g., https://huggingface.co/models?pipeline_tag=image-text-to-text). This aligns with the model's functionality as a Vision-Language Model. - Integrating the paper's abstract and key information from the GitHub README to provide a comprehensive "About Vision-SR1" section.
- Including direct links to the official paper (https://huggingface.co/papers/2508.19652) and the GitHub repository (https://github.com/zli12321/Vision-SR1).
- Embedding relevant figures from the GitHub repository to visually explain the method and dataset.
- Adding links to associated models and datasets on the Hugging Face Hub, as found in the GitHub README.
- Providing installation and training setup instructions based on the GitHub repository.
- Noting the incorrect BibTeX citation in the original GitHub README for the Vision-SR1 paper and including the correct citation for the EasyR1 codebase.
Please review these updates.