Update README.md
Browse files
README.md
CHANGED
@@ -22,7 +22,7 @@ pipeline_tag: audio-to-audio
|
|
22 |
💜 <a href="https://voila.maitrix.org"><b>Project Page</b></a>    |    🖥️ <a href="https://github.com/maitrix-org/Voila">GitHub</a>    |   🤗 <a href="https://huggingface.co/collections/maitrix-org/voila-67e0d96962c19f221fc73fa5">Hugging Face</a>   |    📑 <a href="http://arxiv.org/abs/2505.02707">Paper</a>    |    🌐 <a href="https://huggingface.co/spaces/maitrix-org/Voila-demo">Online Demo</a>   |    🏠<a href="https://maitrix.org">Maitrix.org</a>
|
23 |
</p>
|
24 |
|
25 |
-
Voila is a
|
26 |
|
27 |
# ✨ Highlights
|
28 |
- ⭐ High-fidelity, low-latency, real-time streaming audio processing
|
@@ -136,7 +136,7 @@ If you find our work helpful, please cite us.
|
|
136 |
@article{voila2025,
|
137 |
author = {Yemin Shi, Yu Shu, Siwei Dong, Guangyi Liu, Jaward Sesay, Jingwen Li, Zhiting Hu},
|
138 |
title = {Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Roleplay},
|
139 |
-
eprint={},
|
140 |
archivePrefix={arXiv},
|
141 |
primaryClass={cs.CL},
|
142 |
year = {2025}
|
|
|
22 |
💜 <a href="https://voila.maitrix.org"><b>Project Page</b></a>    |    🖥️ <a href="https://github.com/maitrix-org/Voila">GitHub</a>    |   🤗 <a href="https://huggingface.co/collections/maitrix-org/voila-67e0d96962c19f221fc73fa5">Hugging Face</a>   |    📑 <a href="http://arxiv.org/abs/2505.02707">Paper</a>    |    🌐 <a href="https://huggingface.co/spaces/maitrix-org/Voila-demo">Online Demo</a>   |    🏠<a href="https://maitrix.org">Maitrix.org</a>
|
23 |
</p>
|
24 |
|
25 |
+
Voila is a new family of large voice-language foundation models aiming to lift human-AI interaction experiences to the next level. Breaking away from the constraints of traditional voice AI systems—high latency, loss of vocal nuances, and mechanical responses—Voila employs an innovative end-to-end model design and a novel hierarchical Transformer architecture. This approach enables real-time, autonomous, and rich voice interactions, with latency as low as 195 ms, surpassing average human response times. Combining advanced voice and language modeling, Voila offers customizable, persona-driven engagements and excels in a range of audio tasks from ASR and TTS to speech translation across six languages. With the online [web demo](https://huggingface.co/spaces/maitrix-org/Voila-demo), Voila invites you to explore a transformative, natural dialogue experience between human and AI.
|
26 |
|
27 |
# ✨ Highlights
|
28 |
- ⭐ High-fidelity, low-latency, real-time streaming audio processing
|
|
|
136 |
@article{voila2025,
|
137 |
author = {Yemin Shi, Yu Shu, Siwei Dong, Guangyi Liu, Jaward Sesay, Jingwen Li, Zhiting Hu},
|
138 |
title = {Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Roleplay},
|
139 |
+
eprint={2505.02707},
|
140 |
archivePrefix={arXiv},
|
141 |
primaryClass={cs.CL},
|
142 |
year = {2025}
|