TTS AGI

Activity Feed

AI & ML interests

Decentralized yet unified efforts to accelerate research for Open Text to Speech (TTS) systems!

Recent Activity

TTS-AGI's activity

mrfakenameย 
posted an update 6 days ago
view post
Post
777
Iโ€™m excited to introduce a new leaderboard UI + keyboard shortcuts on the TTS Arena!

The refreshed UI for the leaderboard is smoother and (hopefully) more intuitive. You can now view models based on a simpler win-rate percentage and exclude closed models.

In addition, the TTS Arena now supports keyboard shortcuts. This should make voting much more efficient as you can now vote without clicking anything!

In both the normal Arena and Battle Mode, press "r" to select a random text, Cmd/Ctrl + Enter to synthesize, and "a"/"b" to vote! View more details about keyboard shortcuts by pressing "?" (Shift + /) on the Arena.

Check out all the new updates on the TTS Arena:

TTS-AGI/TTS-Arena
akhaliqย 
posted an update about 1 month ago
view post
Post
7349
Google drops Gemini 2.0 Flash Thinking

a new experimental model that unlocks stronger reasoning capabilities and shows its thoughts. The model plans (with thoughts visible), can solve complex problems with Flash speeds, and more

now available in anychat, try it out: akhaliq/anychat
  • 2 replies
ยท
thomwolfย 
posted an update about 2 months ago
view post
Post
5065
We are proud to announce HuggingFaceFW/fineweb-2: A sparkling update to HuggingFaceFW/fineweb with 1000s of ๐Ÿ—ฃ๏ธlanguages.

We applied the same data-driven approach that led to SOTA English performance in๐Ÿท FineWeb to thousands of languages.

๐Ÿฅ‚ FineWeb2 has 8TB of compressed text data and outperforms other multilingual datasets in our experiments.

The dataset is released under the permissive ๐Ÿ“œ ODC-By 1.0 license, and the ๐Ÿ’ป code to reproduce it and our evaluations is public.

We will very soon announce a big community project, and are working on a ๐Ÿ“ blogpost walking you through the entire dataset creation process. Stay tuned!

In the mean time come ask us question on our chat place: HuggingFaceFW/discussion

H/t @guipenedo @hynky @lvwerra as well as @vsabolcec Bettina Messmer @negar-foroutan and @mjaggi
  • 2 replies
ยท
reach-vbย 
posted an update about 2 months ago
view post
Post
4580
VLMs are going through quite an open revolution AND on-device friendly sizes:

1. Google DeepMind w/ PaliGemma2 - 3B, 10B & 28B: google/paligemma-2-release-67500e1e1dbfdd4dee27ba48

2. OpenGVLabs w/ InternVL 2.5 - 1B, 2B, 4B, 8B, 26B, 38B & 78B: https://huggingface.co/collections/OpenGVLab/internvl-25-673e1019b66e2218f68d7c1c

3. Qwen w/ Qwen 2 VL - 2B, 7B & 72B: Qwen/qwen2-vl-66cee7455501d7126940800d

4. Microsoft w/ FlorenceVL - 3B & 8B: https://huggingface.co/jiuhai

5. Moondream2 w/ 0.5B: https://huggingface.co/vikhyatk/

What a time to be alive! ๐Ÿ”ฅ
thomwolfย 
posted an update about 2 months ago
thomwolfย 
posted an update about 2 months ago
akhaliqย 
posted an update 2 months ago
view post
Post
8362
QwQ-32B-Preview is now available in anychat

A reasoning model that is competitive with OpenAI o1-mini and o1-preview

try it out: akhaliq/anychat
  • 1 reply
ยท
akhaliqย 
posted an update 2 months ago
view post
Post
3901
New model drop in anychat

allenai/Llama-3.1-Tulu-3-8B is now available

try it here: akhaliq/anychat
reach-vbย 
posted an update 2 months ago
view post
Post
4433
Massive week for Open AI/ ML:

Mistral Pixtral & Instruct Large - ~123B, 128K context, multilingual, json + function calling & open weights
mistralai/Pixtral-Large-Instruct-2411
mistralai/Mistral-Large-Instruct-2411

Allen AI Tรผlu 70B & 8B - competive with claude 3.5 haiku, beats all major open models like llama 3.1 70B, qwen 2.5 and nemotron
allenai/tulu-3-models-673b8e0dc3512e30e7dc54f5
allenai/tulu-3-datasets-673b8df14442393f7213f372

Llava o1 - vlm capable of spontaneous, systematic reasoning, similar to GPT-o1, 11B model outperforms gemini-1.5-pro, gpt-4o-mini, and llama-3.2-90B-vision
Xkev/Llama-3.2V-11B-cot

Black Forest Labs Flux.1 tools - four new state of the art model checkpoints & 2 adapters for fill, depth, canny & redux, open weights
reach-vb/black-forest-labs-flux1-6743847bde9997dd26609817

Jina AI Jina CLIP v2 - general purpose multilingual and multimodal (text & image) embedding model, 900M params, 512 x 512 resolution, matroyoshka representations (1024 to 64)
jinaai/jina-clip-v2

Apple AIM v2 & CoreML MobileCLIP - large scale vision encoders outperform CLIP and SigLIP. CoreML optimised MobileCLIP models
apple/aimv2-6720fe1558d94c7805f7688c
apple/coreml-mobileclip

A lot more got released like, OpenScholar (https://huggingface.co/collections/OpenScholar/openscholar-v1-67376a89f6a80f448da411a6), smoltalk ( HuggingFaceTB/smoltalk), Hymba ( nvidia/hymba-673c35516c12c4b98b5e845f), Open ASR Leaderboard ( hf-audio/open_asr_leaderboard) and much more..

Can't wait for the next week! ๐Ÿค—
akhaliqย 
posted an update 2 months ago
view post
Post
2866
anychat

supports chatgpt, gemini, perplexity, claude, meta llama, grok all in one app

try it out there: akhaliq/anychat
thomwolfย 
posted an update 2 months ago
thomwolfย 
posted an update 2 months ago
reach-vbย 
posted an update 2 months ago
view post
Post
4414
What a brilliant week for Open Source AI!

Qwen 2.5 Coder by Alibaba - 0.5B / 1.5B / 3B / 7B / 14B/ 32B (Base + Instruct) Code generation LLMs, with 32B tackling giants like Gemnini 1.5 Pro, Claude Sonnet
Qwen/qwen25-coder-66eaa22e6f99801bf65b0c2f

LLM2CLIP from Microsoft - Leverage LLMs to train ultra-powerful CLIP models! Boosts performance over the previous SOTA by ~17%
microsoft/llm2clip-672323a266173cfa40b32d4c

Athene v2 Chat & Agent by NexusFlow - SoTA general LLM fine-tuned from Qwen 2.5 72B excels at Chat + Function Calling/ JSON/ Agents
Nexusflow/athene-v2-6735b85e505981a794fb02cc

Orca Agent Instruct by Microsoft - 1 million instruct pairs covering text editing, creative writing, coding, reading comprehension, etc - permissively licensed
microsoft/orca-agentinstruct-1M-v1

Ultravox by FixieAI - 70B/ 8B model approaching GPT4o level, pick any LLM, train an adapter with Whisper as Audio Encoder
reach-vb/ultravox-audio-language-model-release-67373b602af0a52b2a88ae71

JanusFlow 1.3 by DeepSeek - Next iteration of their Unified MultiModal LLM Janus with RectifiedFlow
deepseek-ai/JanusFlow-1.3B

Common Corpus by Pleais - 2,003,039,184,047 multilingual, commercially permissive and high quality tokens!
PleIAs/common_corpus

I'm sure I missed a lot, can't wait for the next week!

Put down in comments what I missed! ๐Ÿค—
reach-vbย 
posted an update 3 months ago
view post
Post
1671
Smol TTS models are here! OuteTTS-0.1-350M - Zero shot voice cloning, built on LLaMa architecture, CC-BY license! ๐Ÿ”ฅ

> Pure language modeling approach to TTS
> Zero-shot voice cloning
> LLaMa architecture w/ Audio tokens (WavTokenizer)
> BONUS: Works on-device w/ llama.cpp โšก

Three-step approach to TTS:

> Audio tokenization using WavTokenizer (75 tok per second)
> CTC forced alignment for word-to-audio token mapping
> Structured prompt creation w/ transcription, duration, audio tokens

The model is extremely impressive for 350M parameters! Kudos to the
OuteAI team on such a brilliant feat - I'd love to see this be applied on larger data and smarter backbones like SmolLM ๐Ÿค—

Check out the models here: OuteAI/outetts-6728aa71a53a076e4ba4817c
reach-vbย 
posted an update 3 months ago
view post
Post
3014
Smol models ftw! AMD released AMD OLMo 1B - beats OpenELM, tiny llama on MT Bench, Alpaca Eval - Apache 2.0 licensed ๐Ÿ”ฅ

> Trained with 1.3 trillion (dolma 1.7) tokens on 16 nodes, each with 4 MI250 GPUs

> Three checkpoints:

- AMD OLMo 1B: Pre-trained model
- AMD OLMo 1B SFT: Supervised fine-tuned on Tulu V2, OpenHermes-2.5, WebInstructSub, and Code-Feedback datasets
- AMD OLMo 1B SFT DPO: Aligned with human preferences using Direct Preference Optimization (DPO) on UltraFeedback dataset

Key Insights:
> Pre-trained with less than half the tokens of OLMo-1B
> Post-training steps include two-phase SFT and DPO alignment
> Data for SFT:
- Phase 1: Tulu V2
- Phase 2: OpenHermes-2.5, WebInstructSub, and Code-Feedback

> Model checkpoints on the Hub & Integrated with Transformers โšก๏ธ

Congratulations & kudos to AMD on a brilliant smol model release! ๐Ÿค—

amd/amd-olmo-6723e7d04a49116d8ec95070
thomwolfย 
posted an update 3 months ago
view post
Post
4172
Parents in the 1990: Teach the kids to code
Parents now: Teach the kids to fix the code when it starts walking around ๐Ÿค–โœจ
  • 2 replies
ยท