Towards Automated Kernel Generation in the Era of LLMs Paper • 2601.15727 • Published 3 days ago • 13 • 2
Rethinking Composed Image Retrieval Evaluation: A Fine-Grained Benchmark from Image Editing Paper • 2601.16125 • Published 3 days ago • 13 • 2
From Passive Metric to Active Signal: The Evolving Role of Uncertainty Quantification in Large Language Models Paper • 2601.15690 • Published 3 days ago • 4 • 2
Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces Paper • 2601.11868 • Published 8 days ago • 23 • 1
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders Paper • 2601.16208 • Published 3 days ago • 46 • 2
BayesianVLA: Bayesian Decomposition of Vision Language Action Models via Latent Action Queries Paper • 2601.15197 • Published 4 days ago • 52 • 4
VIOLA: Towards Video In-Context Learning with Minimal Annotations Paper • 2601.15549 • Published 3 days ago • 4 • 2
The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models Paper • 2601.15165 • Published 4 days ago • 59 • 5
MirrorBench: An Extensible Framework to Evaluate User-Proxy Agents for Human-Likeness Paper • 2601.08118 • Published 12 days ago • 1 • 3
LLM-in-Sandbox Elicits General Agentic Intelligence Paper • 2601.16206 • Published 3 days ago • 59 • 4
OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation Paper • 2601.15369 • Published 4 days ago • 16 • 3
Wigner's Friend as a Circuit: Inter-Branch Communication Witness Benchmarks on Superconducting Quantum Hardware Paper • 2601.16004 • Published 3 days ago • 1 • 2