MeissonFlow Research

non-profit
Activity Feed

AI & ML interests

None defined yet.

Recent Activity

MeissonFlow's activity

EnxinΒ 
posted an update about 2 months ago
view post
Post
1216
πŸŽ‰ Introducing Video-MMLU, a new benchmark for evaluating large multimodal models on classroom-style lectures in math, physics, and chemistry!

πŸ§‘β€πŸ«πŸ“šVideo-MMLU requires strong reasoning capabilities and world knowledge compared to the previous benchmarks for video LMMs.

Each video comes with two tasks:
πŸ“ Take Notes β€” detailed captioning of multi-discipline lectures
🧠 Do Quiz β€” open-ended QA to test reasoning over visuals & proofs

We evaluated 90+ models, including vision-blind baselines, open-source models and proprietary ones.
πŸ“‰ We find that existing models generally perform poorly, with accuracy ranging from only 10% to 50%.
πŸ“‰We also explore how the number of visual tokens and the base LLMs influence performance, offering insights into the interplay between multimodal perception and reasoning in lecture comprehension.

For more details, please check below:
πŸ“„ Paper: https://arxiv.org/abs/2504.14693
πŸ’» Code: https://github.com/Espere-1119-Song/Video-MMLU
🧠 Data: Enxin/Video-MMLU
🌐 Website: https://enxinsong.com/Video-MMLU-web/