Text to Speech (TTS) Collection Text to Speech (TTS) models compatible with txtai's TextToSpeech pipeline. • 7 items • Updated 4 days ago • 6
Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation Paper • 2501.15907 • Published 3 days ago • 14
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps Paper • 2501.09732 • Published 14 days ago • 66
Sound Datasets Collection Sound Datasets for ASR/ASV or some other tasks • 12 items • Updated Aug 28, 2024 • 1
Tackling the Generative Learning Trilemma with Denoising Diffusion GANs Paper • 2112.07804 • Published Dec 15, 2021 • 1
ModernBERT Collection Bringing BERT into modernity via both architecture changes and scaling • 3 items • Updated Dec 19, 2024 • 129
Continuous Autoregressive Models with Noise Augmentation Avoid Error Accumulation Paper • 2411.18447 • Published Nov 27, 2024 • 2
Scaling Transformers for Low-Bitrate High-Quality Speech Coding Paper • 2411.19842 • Published Nov 29, 2024 • 11
Cosmos Tokenizer Collection A suite of image and video tokenizers • 13 items • Updated 14 days ago • 37
Molmo Collection Artifacts for open multimodal language models. • 5 items • Updated 24 days ago • 293
Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesis Paper • 2410.23320 • Published Oct 30, 2024 • 8
Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data Flow and Per-Block Quantization Paper • 2403.12422 • Published Mar 19, 2024 • 1
Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models Paper • 2410.11081 • Published Oct 14, 2024 • 19
BigVGAN: A Universal Neural Vocoder with Large-Scale Training Paper • 2206.04658 • Published Jun 9, 2022 • 3
Moshi v0.1 Release Collection MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi • 13 items • Updated Sep 18, 2024 • 225
Parallelizing Linear Transformers with the Delta Rule over Sequence Length Paper • 2406.06484 • Published Jun 10, 2024 • 3
Gated Linear Attention Transformers with Hardware-Efficient Training Paper • 2312.06635 • Published Dec 11, 2023 • 6