An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models Paper • 2403.06764 • Published Mar 11, 2024 • 26
TokenPacker: Efficient Visual Projector for Multimodal LLM Paper • 2407.02392 • Published Jul 2, 2024 • 21
Efficient Inference of Vision Instruction-Following Models with Elastic Cache Paper • 2407.18121 • Published Jul 25, 2024 • 17
Don't Look Twice: Faster Video Transformers with Run-Length Tokenization Paper • 2411.05222 • Published Nov 7, 2024 • 2