CIRCO's reproduction problem

by LightSunKing - opened Jul 19

Jul 19

We reproduced the results of the CIRR test, but were unable to reproduce the results of the CIRCO test. We used all 123,403 images as candidates for retrieval, but the results differed greatly from those in the paper.

For BGE-VL-Base, we get:
Recall@5: 35.62
Recall@10: 45.88
Recall@25: 58.13
Recall@50: 67.12

mAP@5: 20.05
mAP@10: 20.93
mAP@25: 22.61
mAP@50: 23.48

mAP@10 results for each semantic aspect:
Cardinality: 21.17
Addition: 20.48
Negation: 18.28
Direct Addressing: 22.64
Compare & Change: 19.05
Comparative Statement: 21.97
Statement with Conjunction: 21.11
Spatial Relations & Background: 22.47
Viewpoint: 16.58

For BGE-VL-Large, we get:
Recall@5: 40.75
Recall@10: 50.62
Recall@25: 63.62
Recall@50: 74.12

mAP@5: 24.18
mAP@10: 25.08
mAP@25: 27.28
mAP@50: 28.26

mAP@10 results for each semantic aspect:
Cardinality: 24.91
Addition: 26.95
Negation: 24.28
Direct Addressing: 25.88
Compare & Change: 22.32
Comparative Statement: 24.52
Statement with Conjunction: 24.9
Spatial Relations & Background: 27.26
Viewpoint: 20.61

We are not sure if this is due to incorrect testing methods, so could you please provide the corresponding test scripts? Thank you!

JUNJIE99

Beijing Academy of Artificial Intelligence org Aug 6

•

edited Aug 6

pls refer to this issue: https://github.com/VectorSpaceLab/MegaPairs/issues/15#issuecomment-2726556605, and use the scripts we provided in our GitHub repo.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment