Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem Paper β’ 2506.03295 β’ Published 7 days ago β’ 17
MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly Paper β’ 2505.10610 β’ Published 26 days ago β’ 53
view article Article π¦Έπ»#1: Open-endedness and AI Agents β A Path from Generative to Creative AI? By Kseniase β’ Dec 25, 2024 β’ 13
Q-Filters: Leveraging QK Geometry for Efficient KV Cache Compression Paper β’ 2503.02812 β’ Published Mar 4 β’ 10
Q-Filters Collection Pre-computed Q-Filters for efficient KV cache compression. β’ 15 items β’ Updated Mar 3 β’ 7
DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations Paper β’ 2410.18860 β’ Published Oct 24, 2024 β’ 11
Analysing the Residual Stream of Language Models Under Knowledge Conflicts Paper β’ 2410.16090 β’ Published Oct 21, 2024 β’ 7
Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering Paper β’ 2410.15999 β’ Published Oct 21, 2024 β’ 20
Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2 Paper β’ 2408.05147 β’ Published Aug 9, 2024 β’ 40
view article Article Improving Hugging Face Training Efficiency Through Packing with Flash Attention By lwtr and 5 others β’ Aug 21, 2024 β’ 35
π Interpretability & Analysis of LMs Collection Outstanding research in LM interpretability and evaluation, summarized β’ 116 items β’ Updated 5 days ago β’ 105
view article Article Introducing RWKV β An RNN with the advantages of a transformer By BlinkDL and 3 others β’ May 15, 2023 β’ 21
Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation Paper β’ 2406.13663 β’ Published Jun 19, 2024 β’ 7
A Simple and Effective L_2 Norm-Based Strategy for KV Cache Compression Paper β’ 2406.11430 β’ Published Jun 17, 2024 β’ 24
view article Article The Hallucinations Leaderboard, an Open Effort to Measure Hallucinations in Large Language Models By pminervini and 5 others β’ Jan 29, 2024 β’ 28