Khalil Guetari's picture

1 7 5

Khalil Guetari

KhalilGuetari

·

AI & ML interests

None yet

Recent Activity

liked a dataset about 1 month ago

rghermi/sf20k

reacted to andito's post with 🔥 2 months ago

Many VLMs claim to process hours of video. But can they follow the story?🤔 Today, we introduce TimeScope: The benchmark that separates true temporal understanding from marketing hype. Let's see how much VLMs really understand!⏳ We test three skills that matter for real-world use: 🔎 Localized Retrieval: Find a specific action. 🧩 Information Synthesis: Piece together scattered clues. 🏃 Fine-Grained Perception: Analyze detailed motion (e.g., count how many times a person swings an axe). The results are in, and they're revealing. Only Gemini 2.5 pro handles 1-hour-long videos. Performance drops sharply with duration, proving that long video understanding is still challenging. We've found the breaking points—now the community can start fixing them.📈 Want to learn more? TimeScope is 100% open-source. Benchmark your model and help us build the next generation of video AI. 📖 Blog: https://huggingface.co/blog/timescope-video-lmm-benchmark 👩‍💻 Leaderboard & Demo: https://huggingface.co/spaces/Apollo-LMMs/TimeScope 📊 Dataset: https://huggingface.co/datasets/Apollo-LMMs/TimeScope ⚙️ Eval Code: https://github.com/EvolvingLMMs-Lab/lmms-eval

liked a model 3 months ago

google/siglip2-base-patch16-512

View all activity

Organizations

authored a paper about 1 year ago

Multimodal Chaptering for Long-Form TV Newscast Video

Paper • 2406.17590 • Published Mar 20, 2024 • 2

authored 2 papers over 1 year ago

Towards Retrieval Augmented Generation over Large Video Libraries

Paper • 2406.14938 • Published Jun 21, 2024 • 22

Inserting Faces inside Captions: Image Captioning with Attention Guided Merging

Paper • 2405.02305 • Published Mar 20, 2024 • 2