COMPACT: COMPositional Atomic-to-Complex Visual Capability Tuning Paper • 2504.21850 • Published 3 days ago • 24
Tokenize Image Patches: Global Context Fusion for Effective Haze Removal in Large Images Paper • 2504.09621 • Published 21 days ago • 11
Attention IoU: Examining Biases in CelebA using Attention Maps Paper • 2503.19846 • Published Mar 25 • 7
Attention IoU: Examining Biases in CelebA using Attention Maps Paper • 2503.19846 • Published Mar 25 • 7
Unifying Specialized Visual Encoders for Video Language Models Paper • 2501.01426 • Published Jan 2 • 21
Unifying Specialized Visual Encoders for Video Language Models Paper • 2501.01426 • Published Jan 2 • 21 • 2
xT: Nested Tokenization for Larger Context in Large Images Paper • 2403.01915 • Published Mar 4, 2024
Unifying Specialized Visual Encoders for Video Language Models Paper • 2501.01426 • Published Jan 2 • 21
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark Paper • 2410.03051 • Published Oct 4, 2024 • 6