SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models Paper • 2502.09390 • Published Feb 13 • 16
RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation Paper • 2408.02545 • Published Aug 5, 2024 • 39
Distributed Speculative Inference of Large Language Models Paper • 2405.14105 • Published May 23, 2024 • 19
An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs Paper • 2306.16601 • Published Jun 28, 2023 • 4
Dynamic-TinyBERT: Boost TinyBERT's Inference Efficiency by Dynamic Sequence Length Paper • 2111.09645 • Published Nov 18, 2021
Prune Once for All: Sparse Pre-Trained Language Models Paper • 2111.05754 • Published Nov 10, 2021 • 1