Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders Paper • 2503.03601 • Published 7 days ago • 193
GHOST 2.0: generative high-fidelity one shot transfer of heads Paper • 2502.18417 • Published 15 days ago • 63
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers Paper • 2502.15007 • Published 20 days ago • 162
Minitron Collection A family of compressed models obtained via pruning and knowledge distillation • 12 items • Updated Jan 17 • 60
CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases Paper • 2408.03910 • Published Aug 7, 2024 • 18
AQLM+PV Collection Official AQLM quantizations for "PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression": https://arxiv.org/abs/2405.14852 • 26 items • Updated 12 days ago • 20
NuminaMath Collection Datasets and models for training SOTA math LLMs. See our GitHub for training & inference code: https://github.com/project-numina/aimo-progress-prize • 7 items • Updated about 1 month ago • 76
Edit Your Image! Collection Find all the trending and useful Gradio demos that you can use to edit your images. • 21 items • Updated Apr 26, 2024 • 31
MambaByte: Token-free Selective State Space Model Paper • 2401.13660 • Published Jan 24, 2024 • 56
DiLoCo: Distributed Low-Communication Training of Language Models Paper • 2311.08105 • Published Nov 14, 2023 • 15
Prompt Cache: Modular Attention Reuse for Low-Latency Inference Paper • 2311.04934 • Published Nov 7, 2023 • 32
AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn Paper • 2306.08640 • Published Jun 14, 2023 • 26