ControlText: Unlocking Controllable Fonts in Multilingual Text Rendering without Font Annotations Paper • 2502.10999 • Published Feb 16
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities Paper • 2507.06261 • Published Jul 7 • 60
MMMT-IF: A Challenging Multimodal Multi-Turn Instruction Following Benchmark Paper • 2409.18216 • Published Sep 26, 2024 • 1
Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in New Paradigm Paper • 2409.07226 • Published Sep 11, 2024 • 1
Towards Rationality in Language and Multimodal Agents: A Survey Paper • 2406.00252 • Published Jun 1, 2024
MMMT-IF: A Challenging Multimodal Multi-Turn Instruction Following Benchmark Paper • 2409.18216 • Published Sep 26, 2024 • 1
Bio-Inspired Night Image Enhancement Based on Contrast Enhancement and Denoising Paper • 2307.05447 • Published Jul 11, 2023 • 2
Faithful Persona-based Conversational Dataset Generation with Large Language Models Paper • 2312.10007 • Published Dec 15, 2023 • 9
Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and KiSing-v2 Paper • 2401.17619 • Published Jan 31, 2024 • 1
Bio-Inspired Night Image Enhancement Based on Contrast Enhancement and Denoising Paper • 2307.05447 • Published Jul 11, 2023 • 2
Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and KiSing-v2 Paper • 2401.17619 • Published Jan 31, 2024 • 1
Faithful Persona-based Conversational Dataset Generation with Large Language Models Paper • 2312.10007 • Published Dec 15, 2023 • 9