ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models
Paper
โข
2502.09696
โข
Published
โข
36
Score image-text similarity using CLIP or SigLIP models
Identify objects in images using text prompts
Segment images using text descriptions
Generate correspondences between images
Explore images from ImageNet-Hard dataset