JAM: A Tiny Flow-based Song Generator with Fine-grained Controllability and Aesthetic Alignment Paper • 2507.20880 • Published 24 days ago • 10
NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks Paper • 2504.19854 • Published Apr 28 • 7
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization Paper • 2412.21037 • Published Dec 30, 2024 • 24
Reward Steering with Evolutionary Heuristics for Decoding-time Alignment Paper • 2406.15193 • Published Jun 21, 2024 • 15
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization Paper • 2404.09956 • Published Apr 15, 2024 • 12