SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published Feb 20 • 146
PaliGemma 2: A Family of Versatile VLMs for Transfer Paper • 2412.03555 • Published Dec 4, 2024 • 134
A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark Paper • 1910.04867 • Published Oct 1, 2019
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale Paper • 2010.11929 • Published Oct 22, 2020 • 10
Big Transfer (BiT): General Visual Representation Learning Paper • 1912.11370 • Published Dec 24, 2019 • 1
Knowledge distillation: A good teacher is patient and consistent Paper • 2106.05237 • Published Jun 9, 2021
PaLI: A Jointly-Scaled Multilingual Language-Image Model Paper • 2209.06794 • Published Sep 14, 2022 • 2
Revisiting Self-Supervised Visual Representation Learning Paper • 1901.09005 • Published Jan 25, 2019
PaLI-3 Vision Language Models: Smaller, Faster, Stronger Paper • 2310.09199 • Published Oct 13, 2023 • 29