ColSmolVLM-500M-Instruct: Visual Retriever based on SmolVLM-500M-Instruct with ColBERT strategy

ColSmolVLM is a model based on a novel model architecture and training strategy based on Vision Language Models (VLMs) to efficiently index documents from their visual features. It is a SmolVLM extension that generates ColBERT- style multi-vector representations of text and images. It was introduced in the paper ColPali: Efficient Document Retrieval with Vision Language Models and first released in this repository

This version is the untrained base version to guarantee deterministic projection layer initialization.

License

ColSmol's vision language backbone model (ColSmolVLM) is under apache2.0 license. The adapters attached to the model are under MIT license.

Contact

Manuel Faysse: [email protected]
Hugues Sibille: [email protected]
Tony Wu: [email protected]

Citation

If you use any datasets or models from this organization in your research, please cite the original dataset as follows:

@misc{faysse2024colpaliefficientdocumentretrieval,
  title={ColPali: Efficient Document Retrieval with Vision Language Models}, 
  author={Manuel Faysse and Hugues Sibille and Tony Wu and Bilel Omrani and Gautier Viaud and Céline Hudelot and Pierre Colombo},
  year={2024},
  eprint={2407.01449},
  archivePrefix={arXiv},
  primaryClass={cs.IR},
  url={https://arxiv.org/abs/2407.01449}, 
}

vidore
/

ColSmolVLM-Instruct-500M-base

ColSmolVLM-500M-Instruct: Visual Retriever based on SmolVLM-500M-Instruct with ColBERT strategy

License

Contact

Citation

Model tree for vidore/ColSmolVLM-Instruct-500M-base