view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM 13 days ago β’ 342
view article Article A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality 21 days ago β’ 69
Moshi v0.1 Release Collection MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi β’ 13 items β’ Updated Sep 18, 2024 β’ 226
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper β’ 2402.13753 β’ Published Feb 21, 2024 β’ 115
ChatAnything: Facetime Chat with LLM-Enhanced Personas Paper β’ 2311.06772 β’ Published Nov 12, 2023 β’ 35
Music ControlNet: Multiple Time-varying Controls for Music Generation Paper β’ 2311.07069 β’ Published Nov 13, 2023 β’ 45
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models Paper β’ 2311.06783 β’ Published Nov 12, 2023 β’ 28
I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models Paper β’ 2311.04145 β’ Published Nov 7, 2023 β’ 35
Learning From Mistakes Makes LLM Better Reasoner Paper β’ 2310.20689 β’ Published Oct 31, 2023 β’ 29
CapsFusion: Rethinking Image-Text Data at Scale Paper β’ 2310.20550 β’ Published Oct 31, 2023 β’ 26