ColPali
Safetensors
English
qwen2_5_omni_thinker

ColQwen2.5-Omni: Visual+Audio Retriever based on Qwen2.5-Omni-3B-Instruct with ColBERT strategy

ColQwen is a model based on a novel model architecture and training strategy based on Vision/Audio Language Models (VLMs) to efficiently index documents from their visual/audio features. It is a Qwen2.5-Omni-3B extension that generates ColBERT- style multi-vector representations of text, images and audio. It was introduced in the paper ColPali: Efficient Document Retrieval with Vision Language Models and first released in this repository

This version is the untrained base version to guarantee deterministic projection layer initialization.

Usage

This version should not be used: it is solely the base version useful for deterministic LoRA initialization.

Contact

Citation

If you use any datasets or models from this organization in your research, please cite the original dataset as follows:

@misc{faysse2024colpaliefficientdocumentretrieval,
  title={ColPali: Efficient Document Retrieval with Vision Language Models}, 
  author={Manuel Faysse and Hugues Sibille and Tony Wu and Bilel Omrani and Gautier Viaud and Céline Hudelot and Pierre Colombo},
  year={2024},
  eprint={2407.01449},
  archivePrefix={arXiv},
  primaryClass={cs.IR},
  url={https://arxiv.org/abs/2407.01449}, 
}
Downloads last month
250
Safetensors
Model size
4.39B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for vidore/colqwen2.5omni-base

Finetuned
(7)
this model
Finetunes
1 model