Conan: A Chunkwise Online Network for Zero-Shot Adaptive Voice Conversion Paper • 2507.14534 • Published Jul 19
Leveraging Pretrained Diffusion Models for Zero-Shot Part Assembly Paper • 2505.00426 • Published May 1
TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis Paper • 2505.14910 • Published May 20 • 1
STARS: A Unified Framework for Singing Transcription, Alignment, and Refined Style Annotation Paper • 2507.06670 • Published Jul 9
Multimodal DeepResearcher: Generating Text-Chart Interleaved Reports From Scratch with Agentic Framework Paper • 2506.02454 • Published Jun 3 • 6
DMind Benchmark: The First Comprehensive Benchmark for LLM Evaluation in the Web3 Domain Paper • 2504.16116 • Published Apr 18 • 12
TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow Matching Paper • 2502.12572 • Published Feb 18 • 2
MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis Paper • 2502.18924 • Published Feb 26 • 14
Versatile Framework for Song Generation with Prompt-based Control Paper • 2504.19062 • Published Apr 27 • 6
ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting Paper • 2504.20630 • Published Apr 29 • 9
Robust Singing Voice Transcription Serves Synthesis Paper • 2405.09940 • Published May 16, 2024 • 1
Self-Distillation Bridges Distribution Gap in Language Model Fine-Tuning Paper • 2402.13669 • Published Feb 21, 2024 • 1
GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks Paper • 2409.13832 • Published Sep 20, 2024 • 1
TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control Paper • 2409.15977 • Published Sep 24, 2024 • 2
StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis Paper • 2312.10741 • Published Dec 17, 2023 • 1
VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters Paper • 2408.17253 • Published Aug 30, 2024 • 40
Bridging Cross-task Protocol Inconsistency for Distillation in Dense Object Detection Paper • 2308.14286 • Published Aug 28, 2023
Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model Paper • 2406.19905 • Published Jun 28, 2024
Searching Priors Makes Text-to-Video Synthesis Better Paper • 2406.03215 • Published Jun 5, 2024 • 14