ReT-2
Collection
Models and data for the paper "Recurrence Meets Transformers for Universal Multimodal Retrieval" (arXiv 2509.08897)
•
10 items
•
Updated
•
1
Official implementation of ReT-2: Recurrence Meets Transformers for Universal Multimodal Retrieval.
This model features visual and textual backbones based on laion/CLIP-ViT-H-14-laion2B-s32B-b79K.
The backbones have been fine-tuned on the M2KR dataset.
@article{caffagni2025recurrencemeetstransformers,
title={{Recurrence Meets Transformers for Universal Multimodal Retrieval}},
author={Davide Caffagni and Sara Sarto and Marcella Cornia and Lorenzo Baraldi and Rita Cucchiara},
journal={arXiv preprint arXiv:2509.08897},
year={2025}
}
Base model
laion/CLIP-ViT-H-14-laion2B-s32B-b79K