CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction Paper ⢠2502.07316 ⢠Published Feb 11 ⢠49
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper ⢠2502.05171 ⢠Published Feb 7 ⢠140
Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch Paper ⢠2501.18512 ⢠Published Jan 30 ⢠30
Structured 3D Latents for Scalable and Versatile 3D Generation Paper ⢠2412.01506 ⢠Published Dec 2, 2024 ⢠75
On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes Paper ⢠2306.13649 ⢠Published Jun 23, 2023 ⢠21
Cautious Optimizers: Improving Training with One Line of Code Paper ⢠2411.16085 ⢠Published Nov 25, 2024 ⢠21
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency Paper ⢠2409.02634 ⢠Published Sep 4, 2024 ⢠98
Memory-Efficient LLM Training with Online Subspace Descent Paper ⢠2408.12857 ⢠Published Aug 23, 2024 ⢠14
view article Article Introducing Idefics2: A Powerful 8B Vision-Language Model for the community Apr 15, 2024 ⢠177
Longhorn: State Space Models are Amortized Online Learners Paper ⢠2407.14207 ⢠Published Jul 19, 2024 ⢠18
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks Paper ⢠2311.06242 ⢠Published Nov 10, 2023 ⢠93
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry Paper ⢠2402.04347 ⢠Published Feb 6, 2024 ⢠15
Towards Modular LLMs by Building and Reusing a Library of LoRAs Paper ⢠2405.11157 ⢠Published May 18, 2024 ⢠31
SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts Paper ⢠2405.07518 ⢠Published May 13, 2024 ⢠28
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper ⢠2404.14219 ⢠Published Apr 22, 2024 ⢠259