Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation Paper • 2502.14846 • Published Feb 20 • 13
Articulate-Anything: Automatic Modeling of Articulated Objects via a Vision-Language Foundation Model Paper • 2410.13882 • Published Oct 3, 2024
MiRAGeNews: Multimodal Realistic AI-Generated News Detection Paper • 2410.09045 • Published Oct 11, 2024 • 4
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published Sep 25, 2024 • 121
Interpretable-by-Design Text Understanding with Iteratively Generated Concept Bottleneck Paper • 2310.19660 • Published Oct 30, 2023
CoMo: Controllable Motion Generation through Language Guided Pose Code Editing Paper • 2403.13900 • Published Mar 20, 2024
A Textbook Remedy for Domain Shifts: Knowledge Priors for Medical Image Analysis Paper • 2405.14839 • Published May 23, 2024
Z-LaVI: Zero-Shot Language Solver Fueled by Visual Imagination Paper • 2210.12261 • Published Oct 21, 2022
Visualizing the Obvious: A Concreteness-based Ensemble Model for Noun Property Prediction Paper • 2210.12905 • Published Oct 24, 2022
I Spy a Metaphor: Large Language Models and Diffusion Models Co-Create Visual Metaphors Paper • 2305.14724 • Published May 24, 2023
Show Me More Details: Discovering Hierarchies of Procedures from Semi-structured Web Data Paper • 2203.07264 • Published Mar 14, 2022
Induce, Edit, Retrieve: Language Grounded Multimodal Schema for Instructional Video Retrieval Paper • 2111.09276 • Published Nov 17, 2021
Causal Reasoning of Entities and Events in Procedural Texts Paper • 2301.10896 • Published Jan 26, 2023
Language in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification Paper • 2211.11158 • Published Nov 21, 2022
Holodeck: Language Guided Generation of 3D Embodied AI Environments Paper • 2312.09067 • Published Dec 14, 2023 • 16