Spaces:
Running
on
Zero
Apply for community grant: Academic project (gpu)
Hi Hugging Face team and community,
We are excited to introduce EchoX (arXiv paper would be made public soon), an end-to-end speech-to-speech (S2S) large language model designed to bring more natural and intelligent real-time voice interaction. Unlike existing S2S models that often suffer from degraded reasoning ability, EchoX addresses the Acoustic–Semantic Gap by integrating semantic representations with dynamic speech token generation. This enables EchoX to preserve the strong reasoning ability of text-based LLMs while delivering smooth speech outputs.
Key highlights:
Efficiency: EchoX achieves competitive performance with only ~10K hours of training data, while many existing models require millions.
Innovation: Our Echo training strategy and use of compact unit language tokens significantly reduce speech sequence length and improve accuracy.
Real-time readiness: With streaming generation, EchoX can produce speech in a low-latency, interactive manner, ideal for live demos.
Open-source commitment: We are committed to making EchoX fully open-source — not only weights and datasets, but also a hands-on demo so the community can directly experience and build upon our work.
To achieve this, we are seeking GPU resources on Hugging Face Spaces for hosting the interactive demo. With your support, EchoX can become openly accessible, enabling the community to explore next-generation S2S models and inspire further research.
Thank you for considering our request!
@tzzte
Ah, OK, we saw that request earlier, but it only said "An end-to-end S2S LLM.", which doesn't really have any meaningful info for the grant to be approved, so we rejected it.
Grant requests are reviewed by HF staff members, so please try to include enough details for us to make a proper decision.
Anyway, I've just assigned ZeroGPU to this Space. Hope this helps!