WavTokenizer-Medium-Large Collection https://arxiv.org/abs/2408.16532 • 4 items • Updated 28 days ago • 9
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling Paper • 2408.16532 • Published Aug 29, 2024 • 50
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM Paper • 2503.04724 • Published 19 days ago • 66
PERSE: Personalized 3D Generative Avatars from A Single Portrait Paper • 2412.21206 • Published Dec 30, 2024 • 19
view article Article Transformers.js v3: WebGPU support, new models & tasks, and more… Oct 22, 2024 • 72
Phi-4 Collection Phi-4 family of small language and multi-modal models. • 7 items • Updated 22 days ago • 112
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching Paper • 2410.06885 • Published Oct 9, 2024 • 45
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning Paper • 2502.06781 • Published Feb 10 • 60
SYNTHETIC-1 Collection A collection of tasks & verifiers for reasoning datasets • 9 items • Updated Feb 20 • 49
GeoPixel Collection Pixel Grounding Large Multimodal Model in Remote Sensing • 5 items • Updated 27 days ago • 1
ArTST - Arabic Text Speech Transformer Collection Open source project for Arabic Speech Recognition and Generation • 13 items • Updated 24 days ago • 10