Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model Aug 22, 2023 • 27
view article Article Releasing the largest multilingual open pretraining dataset By Pclanglais • 4 days ago • 88
SmolLM2 Collection State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 8 items • Updated 13 days ago • 167
2024 Interconnects Artifacts Collection Models & datasets mentioned in the bottom section of posts! • 242 items • Updated 3 days ago • 3
FLAIR models : landcover semantic segmentation Collection The FLAIR models is a collection of semantic segmentation models initially developed to classify land cover on very high resolution aerial imagery. • 9 items • Updated Jun 19 • 10
Pangea Collection A Fully Open Multilingual Multimodal LLM for 39 Languages • 18 items • Updated 16 days ago • 17
view article Article Democratization of AI, Open Source, and AI Auditing: Thoughts from the DisinfoCon Panel in Berlin By frimelle • Oct 8 • 5
Moshi v0.1 Release Collection MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi • 13 items • Updated Sep 18 • 217
view article Article Getty Images Brings High-Quality, Commercially Safe Dataset to Hugging Face By andreagagliano • Sep 6 • 16
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22 • 118
LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs Paper • 2408.13467 • Published Aug 24 • 24
view article Article 🔥 Argilla 2.0: the data-centric tool for AI makers 🤗 By dvilasuero • Jul 30 • 37
🍃 MINT-1T Collection Data for "MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens" • 13 items • Updated Jul 24 • 54
view article Article Querying Datasets with the Datasets Explorer Chrome Extension By cfahlgren1 • Jul 19 • 6