maitrix-org
/

Voila-base

@@ -26,7 +26,7 @@ tags:
     💜 <a href="https://voila.maitrix.org"><b>Project Page</b></a> &nbsp&nbsp ｜ &nbsp&nbsp 🖥️ <a href="https://github.com/maitrix-org/Voila">GitHub</a> &nbsp&nbsp  | &nbsp&nbsp🤗 <a href="https://huggingface.co/collections/maitrix-org/voila-67e0d96962c19f221fc73fa5">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp 📑 <a href="http://arxiv.org/abs/2505.02707">Paper</a> &nbsp&nbsp | &nbsp&nbsp 🌐 <a href="https://huggingface.co/spaces/maitrix-org/Voila-demo">Online Demo</a> &nbsp&nbsp| &nbsp&nbsp 🏠<a href="https://maitrix.org">Maitrix.org</a>
 </p>
-Voila is a groundbreaking family of large audio-language foundation models that revolutionizes human-AI interactions. Breaking away from the constraints of traditional voice AI systems—high latency, loss of vocal nuances, and mechanical responses, Voila employs an innovative end-to-end model design and a novel hierarchical Transformer architecture. This approach enables real-time, autonomous, and rich voice interactions, with latency as low as 195 ms, surpassing average human response times. Combining advanced voice and language modeling, Voila offers customizable, persona-driven engagements and excels in a range of audio tasks from ASR and TTS to speech translation across six languages. With the online [web demo](https://huggingface.co/spaces/maitrix-org/Voila-demo), Voila invites you to explore a transformative, natural dialogue experience between human and AI.
 # ✨ Highlights
 - ⭐ High-fidelity, low-latency, real-time streaming audio processing
@@ -140,7 +140,7 @@ If you find our work helpful, please cite us.
 @article{voila2025,
   author    = {Yemin Shi, Yu Shu, Siwei Dong, Guangyi Liu, Jaward Sesay, Jingwen Li, Zhiting Hu},
   title     = {Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Roleplay},
-  eprint={},
   archivePrefix={arXiv},
   primaryClass={cs.CL},
   year      = {2025}

     💜 <a href="https://voila.maitrix.org"><b>Project Page</b></a> &nbsp&nbsp ｜ &nbsp&nbsp 🖥️ <a href="https://github.com/maitrix-org/Voila">GitHub</a> &nbsp&nbsp  | &nbsp&nbsp🤗 <a href="https://huggingface.co/collections/maitrix-org/voila-67e0d96962c19f221fc73fa5">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp 📑 <a href="http://arxiv.org/abs/2505.02707">Paper</a> &nbsp&nbsp | &nbsp&nbsp 🌐 <a href="https://huggingface.co/spaces/maitrix-org/Voila-demo">Online Demo</a> &nbsp&nbsp| &nbsp&nbsp 🏠<a href="https://maitrix.org">Maitrix.org</a>
 </p>
+Voila is a new family of large voice-language foundation models aiming to lift human-AI interaction experiences to the next level. Breaking away from the constraints of traditional voice AI systems—high latency, loss of vocal nuances, and mechanical responses—Voila employs an innovative end-to-end model design and a novel hierarchical Transformer architecture. This approach enables real-time, autonomous, and rich voice interactions, with latency as low as 195 ms, surpassing average human response times. Combining advanced voice and language modeling, Voila offers customizable, persona-driven engagements and excels in a range of audio tasks from ASR and TTS to speech translation across six languages. With the online [web demo](https://huggingface.co/spaces/maitrix-org/Voila-demo), Voila invites you to explore a transformative, natural dialogue experience between human and AI.
 # ✨ Highlights
 - ⭐ High-fidelity, low-latency, real-time streaming audio processing
 @article{voila2025,
   author    = {Yemin Shi, Yu Shu, Siwei Dong, Guangyi Liu, Jaward Sesay, Jingwen Li, Zhiting Hu},
   title     = {Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Roleplay},
+  eprint={2505.02707},
   archivePrefix={arXiv},
   primaryClass={cs.CL},
   year      = {2025}