LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference Paper • 2407.14057 • Published Jul 19, 2024 • 46
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data Paper • 2404.15653 • Published Apr 24, 2024 • 30
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework Paper • 2404.14619 • Published Apr 22, 2024 • 128
GINA-3D: Learning to Generate Implicit Neural Assets in the Wild Paper • 2304.02163 • Published Apr 4, 2023
Speculative Streaming: Fast LLM Inference without Auxiliary Models Paper • 2402.11131 • Published Feb 16, 2024 • 43
Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation Paper • 2404.06910 • Published Apr 10, 2024 • 3