One-Step Residual Shifting Diffusion for Image Super-Resolution via Distillation Paper • 2503.13358 • Published 6 days ago • 82
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models Paper • 2503.16419 • Published 3 days ago • 53
Optimizing Decomposition for Optimal Claim Verification Paper • 2503.15354 • Published 4 days ago • 18
φ-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation Paper • 2503.13288 • Published 6 days ago • 46
Implicit Reasoning in Transformers is Reasoning through Shortcuts Paper • 2503.07604 • Published 13 days ago • 21
WritingBench: A Comprehensive Benchmark for Generative Writing Paper • 2503.05244 • Published 16 days ago • 17
DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs Paper • 2503.07067 • Published 13 days ago • 29
SEAP: Training-free Sparse Expert Activation Pruning Unlock the Brainpower of Large Language Models Paper • 2503.07605 • Published 13 days ago • 65
Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders Paper • 2503.03601 • Published 18 days ago • 215
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization Paper • 2503.04598 • Published 17 days ago • 18
EuroBERT: Scaling Multilingual Encoders for European Languages Paper • 2503.05500 • Published 16 days ago • 75
L^2M: Mutual Information Scaling Law for Long-Context Language Modeling Paper • 2503.04725 • Published 17 days ago • 19
LLM as a Broken Telephone: Iterative Generation Distorts Information Paper • 2502.20258 • Published 24 days ago • 24
RedPajama: an Open Dataset for Training Large Language Models Paper • 2411.12372 • Published Nov 19, 2024 • 53
HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs Paper • 2503.02003 • Published 20 days ago • 44
When an LLM is apprehensive about its answers -- and when its uncertainty is justified Paper • 2503.01688 • Published 20 days ago • 19
From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens Paper • 2502.18890 • Published 25 days ago • 27