Wan-S2V: Audio-Driven Cinematic Video Generation Paper โข 2508.18621 โข Published 14 days ago โข 16
TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Language Modeling Paper โข 2508.16790 โข Published 18 days ago โข 7
Waver: Wave Your Way to Lifelike Video Generation Paper โข 2508.15761 โข Published 19 days ago โข 33
view post Post 4180 ๐ Introducing MGM-Omni, an omni-chatbot capable of processing text, image, video, and speech inputs, and can generate both text and speech responses.๐ MGM-Omni support hour-level audio understanding.๐ฃ๏ธ MGM-Omni support 10-minute speech generation and voice cloning.For more details, please check:๐ Blog: https://mgm-omni.notion.site/MGM-Omni-An-Open-source-Omni-Chatbot-2395728e0b0180149ac9f24683fc9907 ๐ Code: https://github.com/dvlab-research/MGM-Omni ๐ค Model: wcy1122/mgm-omni-6896075e97317a88825032e1 ๐ฎ Demo: wcy1122/MGM-Omni See translation ๐ 8 8 ๐ฅ 3 3 + Reply
view post Post 6274 gpt-oss-120B scored 28 (one of the lowest) on AHA leaderboard. not very human aligned model. these kind of models are not really "free": they are costing you your freedom if you know what i mean. See translation 13 replies ยท ๐ 15 15 ๐ 8 8 + Reply