Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets Paper • 2505.07747 • Published 9 days ago • 60
Step1X-Edit: A Practical Framework for General Image Editing Paper • 2504.17761 • Published 27 days ago • 88
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning Paper • 2504.07128 • Published Apr 2 • 84
Kimi-VL-A3B Collection Moonshot's efficient MoE VLMs, exceptional on agent, long-context, and thinking • 6 items • Updated Apr 12 • 65
Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation Paper • 2504.02542 • Published Apr 3 • 45
Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models Paper • 2405.04233 • Published May 7, 2024 • 2
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis Paper • 2411.01156 • Published Nov 2, 2024 • 7
Wan: Open and Advanced Large-Scale Video Generative Models Paper • 2503.20314 • Published Mar 26 • 51
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models Paper • 2406.02430 • Published Jun 4, 2024 • 37
WritingBench: A Comprehensive Benchmark for Generative Writing Paper • 2503.05244 • Published Mar 7 • 18
view article Article Manus AI: The Best Autonomous AI Agent Redefining Automation and Productivity By LLMhacker • Mar 6 • 167
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models Paper • 2306.07691 • Published Jun 13, 2023 • 9