Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
anditoΒ 
posted an update 2 days ago
Post
2517
Many VLMs claim to process hours of video. But can they follow the story?πŸ€”
Today, we introduce TimeScope: The benchmark that separates true temporal understanding from marketing hype. Let's see how much VLMs really understand!⏳

We test three skills that matter for real-world use:
πŸ”Ž Localized Retrieval: Find a specific action.
🧩 Information Synthesis: Piece together scattered clues.
πŸƒ Fine-Grained Perception: Analyze detailed motion (e.g., count how many times a person swings an axe).

The results are in, and they're revealing. Only Gemini 2.5 pro handles 1-hour-long videos.
Performance drops sharply with duration, proving that long video understanding is still challenging. We've found the breaking pointsβ€”now the community can start fixing them.πŸ“ˆ

Want to learn more? TimeScope is 100% open-source. Benchmark your model and help us build the next generation of video AI.

πŸ“– Blog:
https://huggingface.co/blog/timescope-video-lmm-benchmark
πŸ‘©β€πŸ’» Leaderboard & Demo: Apollo-LMMs/TimeScope
πŸ“Š Dataset: Apollo-LMMs/TimeScope
βš™οΈ Eval Code: https://github.com/EvolvingLMMs-Lab/lmms-eval
In this post