Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation Paper • 2506.04614 • Published 22 days ago • 16
RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference Paper • 2505.02922 • Published May 5 • 27
LLMs for Engineering: Teaching Models to Design High Powered Rockets Paper • 2504.19394 • Published Apr 27 • 13
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation Paper • 2504.08736 • Published Apr 11 • 47
SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement Paper • 2504.07934 • Published Apr 10 • 19
TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes Paper • 2503.23461 • Published Mar 30 • 95
VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control Paper • 2503.05639 • Published Mar 7 • 24
EuroBERT: Scaling Multilingual Encoders for European Languages Paper • 2503.05500 • Published Mar 7 • 81
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published Feb 20 • 145
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22 • 404
Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models Paper • 2412.18605 • Published Dec 24, 2024 • 21
TransPixar: Advancing Text-to-Video Generation with Transparency Paper • 2501.03006 • Published Jan 6 • 27
LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis Paper • 2412.15214 • Published Dec 19, 2024 • 15
AnySat: An Earth Observation Model for Any Resolutions, Scales, and Modalities Paper • 2412.14123 • Published Dec 18, 2024 • 11
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published Dec 13, 2024 • 146