Compact VLM Filter: Image-caption filtration-oriented Qwen2VL model

This model is a fine-tuned version of Qwen/Qwen2-VL-2B-Instruct trained to perform filtration-oriented image-text evaluation, based on our custom dataset.

πŸ” Intended Use

The model is designed to:

  • Evaluate alignment of image and caption
  • Provide image/caption alignment scores and textual justification for noisy web-scale data
  • Supports local deployment for cost-efficient training data filtration

πŸ‹οΈ Training Details

  • Base model: Qwen/Qwen2-VL-2B-Instruct
  • Fine-tuning objective: in-context evaluation of aligment, quality and safety
  • Dataset: ~4.8K samples with score, justification, caption, and image

🀝 Acknowledgements

Thanks to the Qwen team for open-sourcing their VLM models, which serve as the foundation for our filtration-oriented model.

πŸ“œ License

Licensed under the Apache License 2.0.

Downloads last month
4
Safetensors
Model size
2B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Dauka-transformers/Compact_VLM_filter

Base model

Qwen/Qwen2-VL-2B
Finetuned
(278)
this model

Dataset used to train Dauka-transformers/Compact_VLM_filter