LongVILA: Scaling Long-Context Visual Language Models for Long Videos Paper β’ 2408.10188 β’ Published Aug 19, 2024 β’ 53
VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation Paper β’ 2409.04429 β’ Published Sep 6, 2024
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers Paper β’ 2410.10629 β’ Published Oct 14, 2024 β’ 12
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training Paper β’ 2410.19313 β’ Published Oct 25, 2024 β’ 19
TinyTL: Reduce Activations, Not Trainable Parameters for Efficient On-Device Learning Paper β’ 2007.11622 β’ Published Jul 22, 2020
NVILA: Efficient Frontier Visual Language Models Paper β’ 2412.04468 β’ Published Dec 5, 2024 β’ 60
Wolf: Captioning Everything with a World Summarization Framework Paper β’ 2407.18908 β’ Published Jul 26, 2024 β’ 33
HAT: Hardware-Aware Transformers for Efficient Natural Language Processing Paper β’ 2005.14187 β’ Published May 28, 2020 β’ 2
ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware Paper β’ 1812.00332 β’ Published Dec 2, 2018
PockEngine: Sparse and Efficient Fine-tuning in a Pocket Paper β’ 2310.17752 β’ Published Oct 26, 2023 β’ 14