3DIS-FLUX: simple and efficient multi-instance generation with DiT rendering Paper • 2501.05131 • Published 9 days ago • 32
Beyond Sight: Finetuning Generalist Robot Policies with Heterogeneous Sensors via Language Grounding Paper • 2501.04693 • Published 10 days ago • 2
XMusic: Towards a Generalized and Controllable Symbolic Music Generation Framework Paper • 2501.08809 • Published 3 days ago • 9
Towards Best Practices for Open Datasets for LLM Training Paper • 2501.08365 • Published 4 days ago • 40
Do generative video models learn physical principles from watching videos? Paper • 2501.09038 • Published 4 days ago • 11
Exploring the Inquiry-Diagnosis Relationship with Advanced Patient Simulators Paper • 2501.09484 • Published 2 days ago • 16
OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking Paper • 2501.09751 • Published 1 day ago • 29
CaPa: Carve-n-Paint Synthesis for Efficient 4K Textured Mesh Generation Paper • 2501.09433 • Published 2 days ago • 10
AnyStory: Towards Unified Single and Multiple Subject Personalization in Text-to-Image Generation Paper • 2501.09503 • Published 2 days ago • 6
FAST: Efficient Action Tokenization for Vision-Language-Action Models Paper • 2501.09747 • Published 1 day ago • 12
SynthLight: Portrait Relighting with Diffusion Model by Learning to Re-render Synthetic Faces Paper • 2501.09756 • Published 1 day ago • 14
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models Paper • 2501.09686 • Published 1 day ago • 16
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps Paper • 2501.09732 • Published 1 day ago • 40
Learnings from Scaling Visual Tokenizers for Reconstruction and Generation Paper • 2501.09755 • Published 1 day ago • 21
RLHS: Mitigating Misalignment in RLHF with Hindsight Simulation Paper • 2501.08617 • Published 3 days ago • 7
Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion Paper • 2501.09019 • Published 3 days ago • 10
Trusted Machine Learning Models Unlock Private Inference for Problems Currently Infeasible with Cryptography Paper • 2501.08970 • Published 3 days ago • 5
Multimodal LLMs Can Reason about Aesthetics in Zero-Shot Paper • 2501.09012 • Published 3 days ago • 9
Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding Paper • 2501.07783 • Published 4 days ago • 7
CityDreamer4D: Compositional Generative Model of Unbounded 4D Cities Paper • 2501.08983 • Published 3 days ago • 16