EmbeddingGemma: Powerful and Lightweight Text Representations Paper • 2509.20354 • Published Sep 24, 2025 • 48
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics Paper • 2506.01844 • Published Jun 2, 2025 • 153
HandsOnVLM: Vision-Language Models for Hand-Object Interaction Prediction Paper • 2412.13187 • Published Dec 17, 2024
DataComp: In search of the next generation of multimodal datasets Paper • 2304.14108 • Published Apr 27, 2023 • 2
Scalable Extraction of Training Data from (Production) Language Models Paper • 2311.17035 • Published Nov 28, 2023 • 3
Git Re-Basin: Merging Models modulo Permutation Symmetries Paper • 2209.04836 • Published Sep 11, 2022 • 2
PLeaS -- Merging Models with Permutations and Least Squares Paper • 2407.02447 • Published Jul 2, 2024
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models Paper • 2503.09573 • Published Mar 12, 2025 • 76
Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation Paper • 2502.14846 • Published Feb 20, 2025 • 14
Articulate-Anything: Automatic Modeling of Articulated Objects via a Vision-Language Foundation Model Paper • 2410.13882 • Published Oct 3, 2024
MiRAGeNews: Multimodal Realistic AI-Generated News Detection Paper • 2410.09045 • Published Oct 11, 2024 • 4
Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming Paper • 2501.18837 • Published Jan 31, 2025 • 10