Emerging Properties in Unified Multimodal Pretraining Paper โข 2505.14683 โข Published May 20 โข 130
Unmasked Teacher: Towards Training-Efficient Video Foundation Models Paper โข 2303.16058 โข Published Mar 28, 2023
Harvest Video Foundation Models via Efficient Post-Pretraining Paper โข 2310.19554 โข Published Oct 30, 2023
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark Paper โข 2311.17005 โข Published Nov 28, 2023 โข 2
Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks Paper โข 2401.14159 โข Published Jan 25, 2024 โข 3
UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning Paper โข 2201.04676 โข Published Jan 12, 2022
UniFormer: Unifying Convolution and Self-attention for Visual Recognition Paper โข 2201.09450 โข Published Jan 24, 2022
You Only Need 90K Parameters to Adapt Light: A Light Weight Transformer for Image Enhancement and Exposure Correction Paper โข 2205.14871 โข Published May 30, 2022
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer Paper โข 2211.09552 โข Published Nov 17, 2022
InternVideo: General Video Foundation Models via Generative and Discriminative Learning Paper โข 2212.03191 โข Published Dec 6, 2022
MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration Paper โข 2408.10605 โข Published Aug 20, 2024 โข 1
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning Paper โข 2410.19702 โข Published Oct 25, 2024
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling Paper โข 2501.00574 โข Published Dec 31, 2024 โข 6
Make Your Training Flexible: Towards Deployment-Efficient Video Models Paper โข 2503.14237 โข Published Mar 18 โข 5
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment Paper โข 2412.19326 โข Published Dec 26, 2024 โข 18
view post Post 21911 Google drops Gemini 2.0 Flash Thinkinga new experimental model that unlocks stronger reasoning capabilities and shows its thoughts. The model plans (with thoughts visible), can solve complex problems with Flash speeds, and morenow available in anychat, try it out: https://huggingface.co/spaces/akhaliq/anychat See translation 4 replies ยท ๐ 10 10 ๐ฅ 5 5 ๐ 3 3 ๐ 2 2 + Reply
Causal Diffusion Transformers for Generative Modeling Paper โข 2412.12095 โข Published Dec 16, 2024 โข 23