SCBench: A KV Cache-Centric Analysis of Long-Context Methods Paper • 2412.10319 • Published Dec 13, 2024 • 9
Wolf: Captioning Everything with a World Summarization Framework Paper • 2407.18908 • Published Jul 26, 2024 • 32
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher Paper • 2407.20183 • Published Jul 29, 2024 • 41
view article Article MInference 1.0: 10x Faster Million Context Inference with a Single GPU By liyucheng • Jul 11, 2024 • 12
view article Article How to Optimize TTFT of 8B LLMs with 1M Tokens to 20s By iofu728 • Jul 21, 2024 • 2
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention Paper • 2407.02490 • Published Jul 2, 2024 • 23
LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression Paper • 2310.06839 • Published Oct 10, 2023 • 3
LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models Paper • 2310.05736 • Published Oct 9, 2023 • 4
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression Paper • 2403.12968 • Published Mar 19, 2024 • 25