Compact VLM Filter: Image-caption filtration-oriented Qwen2VL model
This model is a fine-tuned version of Qwen/Qwen2-VL-2B-Instruct trained to perform filtration-oriented image-text evaluation, based on our custom dataset.
π Intended Use
The model is designed to:
- Evaluate alignment of image and caption
- Provide image/caption alignment scores and textual justification for noisy web-scale data
- Supports local deployment for cost-efficient training data filtration
ποΈ Training Details
- Base model: Qwen/Qwen2-VL-2B-Instruct
- Fine-tuning objective: in-context evaluation of aligment, quality and safety
- Dataset: ~4.8K samples with score, justification, caption, and image
π€ Acknowledgements
Thanks to the Qwen team for open-sourcing their VLM models, which serve as the foundation for our filtration-oriented model.
π License
Licensed under the Apache License 2.0.
- Downloads last month
- 4
	Inference Providers
	NEW
	
	
	This model isn't deployed by any Inference Provider.
	π
			
		Ask for provider support