FasterViT: Fast Vision Transformers with Hierarchical Attention Paper β’ 2306.06189 β’ Published Jun 9, 2023 β’ 30
Adaptive Sharpness-Aware Pruning for Robust Sparse Networks Paper β’ 2306.14306 β’ Published Jun 25, 2023
Global Vision Transformer Pruning with Hessian-Aware Saliency Paper β’ 2110.04869 β’ Published Oct 10, 2021
RegionGPT: Towards Region Understanding Vision Language Model Paper β’ 2403.02330 β’ Published Mar 4, 2024 β’ 2
LITA: Language Instructed Temporal-Localization Assistant Paper β’ 2403.19046 β’ Published Mar 27, 2024 β’ 19
X-VILA: Cross-Modality Alignment for Large Language Model Paper β’ 2405.19335 β’ Published May 29, 2024
Flextron: Many-in-One Flexible Large Language Model Paper β’ 2406.10260 β’ Published Jun 11, 2024 β’ 2
LongVILA: Scaling Long-Context Visual Language Models for Long Videos Paper β’ 2408.10188 β’ Published Aug 19, 2024 β’ 52
VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation Paper β’ 2409.04429 β’ Published Sep 6, 2024
EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation Paper β’ 2410.21271 β’ Published Oct 28, 2024 β’ 6
NVILA: Efficient Frontier Visual Language Models Paper β’ 2412.04468 β’ Published Dec 5, 2024 β’ 59
VILA-M3: Enhancing Vision-Language Models with Medical Expert Knowledge Paper β’ 2411.12915 β’ Published Nov 19, 2024