E.T. Bench Collection [NeurIPS 2024] Towards Open-Ended Event-Level Video-Language Understanding • 8 items • Updated 10 days ago
Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion Paper • 2412.14462 • Published Dec 19, 2024 • 15
R2-Tuning Collection Efficient Image-to-Video Transfer Learning for Video Temporal Grounding • 3 items • Updated Nov 6, 2024
ConsNet: Learning Consistency Graph for Zero-Shot Human-Object Interaction Detection Paper • 2008.06254 • Published Aug 14, 2020
Learning to Aggregate Multi-Scale Context for Instance Segmentation in Remote Sensing Images Paper • 2111.11057 • Published Nov 22, 2021
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection Paper • 2203.12745 • Published Mar 23, 2022
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding Paper • 2409.18111 • Published Sep 26, 2024 • 6
E.T. Bench Collection Towards Open-Ended Event-Level Video-Language Understanding • 6 items • Updated Nov 5, 2024
R2-Tuning Collection Efficient Image-to-Video Transfer Learning for Video Temporal Grounding • 3 items • Updated Nov 6, 2024
E.T. Bench Collection [NeurIPS 2024] Towards Open-Ended Event-Level Video-Language Understanding • 8 items • Updated 10 days ago
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding Paper • 2409.18111 • Published Sep 26, 2024 • 6