CIRCO's reproduction problem
We reproduced the results of the CIRR test, but were unable to reproduce the results of the CIRCO test. We used all 123,403 images as candidates for retrieval, but the results differed greatly from those in the paper.
For BGE-VL-Base, we get:
Recall@5: 35.62
Recall@10: 45.88
Recall@25: 58.13
Recall@50: 67.12
mAP@5: 20.05
mAP@10: 20.93
mAP@25: 22.61
mAP@50: 23.48
mAP@10 results for each semantic aspect:
Cardinality: 21.17
Addition: 20.48
Negation: 18.28
Direct Addressing: 22.64
Compare & Change: 19.05
Comparative Statement: 21.97
Statement with Conjunction: 21.11
Spatial Relations & Background: 22.47
Viewpoint: 16.58
For BGE-VL-Large, we get:
Recall@5: 40.75
Recall@10: 50.62
Recall@25: 63.62
Recall@50: 74.12
mAP@5: 24.18
mAP@10: 25.08
mAP@25: 27.28
mAP@50: 28.26
mAP@10 results for each semantic aspect:
Cardinality: 24.91
Addition: 26.95
Negation: 24.28
Direct Addressing: 25.88
Compare & Change: 22.32
Comparative Statement: 24.52
Statement with Conjunction: 24.9
Spatial Relations & Background: 27.26
Viewpoint: 20.61
We are not sure if this is due to incorrect testing methods, so could you please provide the corresponding test scripts? Thank you!
pls refer to this issue: https://github.com/VectorSpaceLab/MegaPairs/issues/15#issuecomment-2726556605, and use the scripts we provided in our GitHub repo.