MobileLLM-R1 MobileLLM-R1, a series of sub-billion parameter reasoning models facebook/MobileLLM-R1-950M Text Generation • 0.9B • Updated about 20 hours ago • 4.97k • 327 facebook/MobileLLM-R1-360M Text Generation • 0.4B • Updated about 20 hours ago • 642 • 15 facebook/MobileLLM-R1-140M Text Generation • 0.1B • Updated about 20 hours ago • 1.16k • 27 facebook/MobileLLM-R1-950M-base Text Generation • 0.9B • Updated about 20 hours ago • 554 • 13
Physics of Language Models: Part 4.2 facebook/PhysicsLM4.2__LlamaCanon-1B-Nemo-1T-lr0.002 Updated Jul 29 • 3 facebook/PhysicsLM4.2__LlamaCanon-1B-Nemo-1T-lr0.003 Updated Jul 29 • 1 facebook/PhysicsLM4.2__LlamaCanon-1B-Nemo-2T-lr0.003 Updated Jul 29 • 2 • 2 facebook/PhysicsLM4.2__LlamaCanon-1B-Nemo-2T-lr0.005 Updated Jul 29 • 17 • 3
V-JEPA 2 A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann facebook/vjepa2-vitl-fpc64-256 Video Classification • 0.3B • Updated Aug 11 • 93.8k • 153 facebook/vjepa2-vith-fpc64-256 Video Classification • 0.7B • Updated Aug 11 • 2.29k • 12 facebook/vjepa2-vitg-fpc64-256 Video Classification • 1B • Updated Aug 11 • 4.86k • 18 facebook/vjepa2-vitg-fpc64-384 Video Classification • 1B • Updated Aug 11 • 8.07k • 30
blt facebook/blt Updated Apr 30 • 91 • 73 facebook/blt-1b 5B • Updated May 1 • 378 • 19 facebook/blt-7b Updated May 1 • 52 • 61 Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published Dec 13, 2024 • 108
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published Dec 13, 2024 • 108
Perception Encoder facebook/PE-Core-L14-336 Zero-Shot Image Classification • Updated Apr 30 • 89.2k • 46 facebook/PE-Core-G14-448 Zero-Shot Image Classification • Updated Apr 30 • 25.3k • 15 facebook/PE-Lang-L14-448 Image Feature Extraction • Updated Apr 30 • 368 • 7 facebook/PE-Lang-G14-448 Image Feature Extraction • Updated Apr 30 • 305 • 13
DRAMA A collection of small (sub-1B) multilingual dense retrievers that generalize well across a number of tasks and languages. facebook/drama-base Sentence Similarity • 0.2B • Updated Jul 21 • 208 • 20 facebook/drama-large Sentence Similarity • 0.4B • Updated Mar 4 • 101 • 7 facebook/drama-1b Sentence Similarity • 1B • Updated Mar 4 • 762 • 12
MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases (ICML 2024) https://arxiv.org/abs/2402.14905 facebook/MobileLLM-R1-950M Text Generation • 0.9B • Updated about 20 hours ago • 4.97k • 327 facebook/MobileLLM-R1-360M Text Generation • 0.4B • Updated about 20 hours ago • 642 • 15 facebook/MobileLLM-R1-140M Text Generation • 0.1B • Updated about 20 hours ago • 1.16k • 27 facebook/MobileLLM-R1-950M-base Text Generation • 0.9B • Updated about 20 hours ago • 554 • 13
LayerSkip Models continually pretrained using LayerSkip - https://arxiv.org/abs/2404.16710 LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding Paper • 2404.16710 • Published Apr 25, 2024 • 79 facebook/layerskip-llama2-7B Text Generation • 7B • Updated Oct 19, 2024 • 383 • 15 facebook/layerskip-llama2-13B Text Generation • 13B • Updated Oct 19, 2024 • 365 • 5 facebook/layerskip-llama2-70B Text Generation • 69B • Updated Nov 3, 2024 • 136 • 5
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding Paper • 2404.16710 • Published Apr 25, 2024 • 79
Seamless Communication A significant step towards removing language barriers through expressive, fast and high-quality AI translation. Seamless: Multilingual Expressive and Streaming Speech Translation Paper • 2312.05187 • Published Dec 8, 2023 • 14 facebook/seamless-m4t-v2-large Automatic Speech Recognition • 2B • Updated Jan 4, 2024 • 53.9k • 893 Runtime error 516 516 Seamless M4T v2 📞 facebook/seamless-expressive Text-to-Speech • Updated Jan 4, 2024 • 186
Seamless: Multilingual Expressive and Streaming Speech Translation Paper • 2312.05187 • Published Dec 8, 2023 • 14
Wav2Vec 2.0 A collection for the first release of Wav2Vec 2.0, a speech encoder that learns powerful representations from unlabelled audio data. facebook/wav2vec2-large-960h-lv60-self Automatic Speech Recognition • Updated May 23, 2022 • 40.7k • 151 facebook/wav2vec2-large-960h Automatic Speech Recognition • Updated Apr 5, 2022 • 64.6k • 31 facebook/wav2vec2-base-960h Automatic Speech Recognition • 0.1B • Updated Nov 14, 2022 • 2.33M • 380 facebook/wav2vec2-base-100h Automatic Speech Recognition • Updated May 27, 2022 • 1.17k • 6
facebook/wav2vec2-large-960h-lv60-self Automatic Speech Recognition • Updated May 23, 2022 • 40.7k • 151
XLSR A collection of multilingual Wav2Vec 2.0 checkpoints pre-trained on 53 languages and fine-tuned for CTC speech recognition. facebook/wav2vec2-large-xlsr-53 Updated Mar 18, 2022 • 148k • 146 facebook/wav2vec2-xlsr-53-espeak-cv-ft Automatic Speech Recognition • Updated Dec 10, 2021 • 463k • 39 facebook/wav2vec2-large-xlsr-53-dutch Automatic Speech Recognition • Updated Jul 6, 2021 • 1.35k • 3 facebook/wav2vec2-large-xlsr-53-french Automatic Speech Recognition • Updated Jul 6, 2021 • 3.36k • 13
facebook/wav2vec2-xlsr-53-espeak-cv-ft Automatic Speech Recognition • Updated Dec 10, 2021 • 463k • 39
facebook/wav2vec2-large-xlsr-53-french Automatic Speech Recognition • Updated Jul 6, 2021 • 3.36k • 13
Robust Wav2Vec 2.0 A collection of "robust" Wav2Vec 2.0 checkpoints pre-trained on datasets from multiple domains. facebook/wav2vec2-large-robust Updated Nov 5, 2021 • 1.41k • 37 facebook/wav2vec2-large-robust-ft-libri-960h Automatic Speech Recognition • 0.3B • Updated Jun 23, 2023 • 286k • 15 facebook/wav2vec2-large-robust-ft-swbd-300h Automatic Speech Recognition • Updated Apr 5, 2022 • 8.72k • 20 Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training Paper • 2104.01027 • Published Apr 2, 2021 • 1
facebook/wav2vec2-large-robust-ft-libri-960h Automatic Speech Recognition • 0.3B • Updated Jun 23, 2023 • 286k • 15
facebook/wav2vec2-large-robust-ft-swbd-300h Automatic Speech Recognition • Updated Apr 5, 2022 • 8.72k • 20
Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training Paper • 2104.01027 • Published Apr 2, 2021 • 1
VoxPopuli v2 A collection of checkpoints from the second VoxPopuli release. VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation Paper • 2101.00390 • Published Jan 2, 2021 • 1 facebook/wav2vec2-base-bg-voxpopuli-v2 Automatic Speech Recognition • Updated Feb 27, 2022 • 32 facebook/wav2vec2-base-cs-voxpopuli-v2 Automatic Speech Recognition • Updated Feb 27, 2022 • 7 • 1 facebook/wav2vec2-base-da-voxpopuli-v2 Automatic Speech Recognition • Updated Feb 27, 2022 • 7
VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation Paper • 2101.00390 • Published Jan 2, 2021 • 1
Fairseq S^2 TTS Text-to-speech models from fairseq s^2 facebook/fastspeech2-en-ljspeech Text-to-Speech • Updated Jan 28, 2022 • 46 • 273 facebook/fastspeech2-en-200_speaker-cv4 Text-to-Speech • Updated Jan 28, 2022 • 9 • 6 facebook/tts_transformer-ar-cv7 Text-to-Speech • Updated Jan 28, 2022 • 8 • 8 facebook/tts_transformer-vi-cv7 Text-to-Speech • Updated Jan 28, 2022 • 10 • 11
MusicGen Stereo A collection of stereo music generation models as part of the v2 MusicGen release. facebook/musicgen-stereo-small Text-to-Audio • 0.6B • Updated Mar 6, 2024 • 1.94k • 34 facebook/musicgen-stereo-medium Text-to-Audio • 2B • Updated Mar 6, 2024 • 380 • 32 facebook/musicgen-stereo-large Text-to-Audio • 3B • Updated Mar 6, 2024 • 651 • 81 facebook/musicgen-stereo-melody-large Text-to-Audio • 3B • Updated Apr 24, 2024 • 264 • 60
Chameleon Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR. facebook/chameleon-7b Image-Text-to-Text • 7B • Updated Jul 23, 2024 • 82k • 189 facebook/chameleon-30b Image-Text-to-Text • 34B • Updated Jul 30, 2024 • 34 • 88
OPT OPT (Open Pretrained Transformer) is a series of open-sourced large causal language models which perform similar in performance to GPT3. facebook/opt-125m Text Generation • Updated Sep 15, 2023 • 9.74M • 218 facebook/opt-350m Text Generation • Updated Sep 15, 2023 • 157k • 148 facebook/opt-1.3b Text Generation • Updated Sep 15, 2023 • 181k • 178 facebook/opt-2.7b Text Generation • Updated Sep 15, 2023 • 59.6k • 86
DINOv3 DINOv3: foundation models producing excellent dense features, outperforming SotA w/o fine-tuning - https://arxiv.org/abs/2508.10104 facebook/dinov3-vit7b16-pretrain-lvd1689m Image Feature Extraction • 7B • Updated Aug 19 • 30.4k • 151 facebook/dinov3-vits16-pretrain-lvd1689m Image Feature Extraction • 0.0B • Updated Aug 19 • 107k • 29 facebook/dinov3-convnext-small-pretrain-lvd1689m Image Feature Extraction • 0.0B • Updated Aug 19 • 215k • 20 facebook/dinov3-vitb16-pretrain-lvd1689m Image Feature Extraction • 0.1B • Updated Aug 19 • 107k • 63
facebook/dinov3-vit7b16-pretrain-lvd1689m Image Feature Extraction • 7B • Updated Aug 19 • 30.4k • 151
facebook/dinov3-vits16-pretrain-lvd1689m Image Feature Extraction • 0.0B • Updated Aug 19 • 107k • 29
facebook/dinov3-convnext-small-pretrain-lvd1689m Image Feature Extraction • 0.0B • Updated Aug 19 • 215k • 20
facebook/dinov3-vitb16-pretrain-lvd1689m Image Feature Extraction • 0.1B • Updated Aug 19 • 107k • 63
Meta CLIP 1/2 Scaling CLIP data with transparent training distribution from an end-to-end pipeline. facebook/metaclip-h14-fullcc2.5b Zero-Shot Image Classification • 1.0B • Updated Jan 11, 2024 • 39.7k • 44 facebook/metaclip-l14-fullcc2.5b Zero-Shot Image Classification • Updated Oct 14, 2023 • 4.01k • 6 facebook/metaclip-b16-fullcc2.5b Zero-Shot Image Classification • Updated Oct 14, 2023 • 13.6k • 10 facebook/metaclip-b32-fullcc2.5b Zero-Shot Image Classification • Updated Oct 8, 2023 • 411 • 8
facebook/metaclip-h14-fullcc2.5b Zero-Shot Image Classification • 1.0B • Updated Jan 11, 2024 • 39.7k • 44
Web-SSL facebook/webssl-dino300m-full2b-224 Image Feature Extraction • 0.3B • Updated Apr 24 • 1.27k • 10 facebook/webssl-dino1b-full2b-224 Image Feature Extraction • 1B • Updated Apr 24 • 5.89k • 3 facebook/webssl-dino2b-full2b-224 Image Feature Extraction • 2B • Updated Apr 24 • 587 facebook/webssl-dino3b-full2b-224 Image Feature Extraction • 3B • Updated Apr 24 • 645
Perception LM facebook/Perception-LM-1B Image-Text-to-Text • 2B • Updated Aug 13 • 1.72k • 36 facebook/Perception-LM-3B Image-Text-to-Text • 4B • Updated Aug 13 • 24.8k • 19 facebook/Perception-LM-8B Image-Text-to-Text • 10B • Updated Jul 14 • 14k • 54 facebook/PLM-VideoBench Viewer • Updated May 21 • 44k • 539 • 11
FAIR Chemistry facebook/OMAT24 Updated about 5 hours ago • 83 facebook/OMAT24 Preview • Updated 4 days ago • 288 • 59 facebook/OMol25 Updated 8 days ago • 152 facebook/UMA Updated Jul 2 • 145
Meta Motivo A first-of-its-kind behavioral foundation model to control a virtual physics-based humanoid agent for a wide range of whole-body tasks. facebook/metamotivo-S-1 0.0B • Updated Dec 12, 2024 • 2.69k • 9 facebook/metamotivo-S-2 0.0B • Updated Dec 12, 2024 • 6 • 2 facebook/metamotivo-S-3 0.0B • Updated Dec 12, 2024 • 5 • 2 facebook/metamotivo-S-4 0.0B • Updated Dec 12, 2024 • 5 • 2
Sparsh Models and datasets for Sparsh: Self-supervised touch representations for vision-based tactile sensing facebook/sparsh-dino-base Updated Oct 21, 2024 • 5 facebook/sparsh-dino-small Updated Oct 21, 2024 • 1 facebook/sparsh-mae-base Updated Oct 21, 2024 • 1 facebook/sparsh-mae-small Updated Oct 21, 2024 • 1
MelodyFlow MelodyFlow: High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching Paper • 2407.03648 • Published Jul 4, 2024 • 19 facebook/melodyflow-t24-30secs Updated Oct 23, 2024 • 27 Running on Zero 121 121 MelodyFlow 🎵 Generate music from text descriptions
High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching Paper • 2407.03648 • Published Jul 4, 2024 • 19
MAGNeT Masked Audio Generation using a Single Non-Autoregressive Transformer Masked Audio Generation using a Single Non-Autoregressive Transformer Paper • 2401.04577 • Published Jan 9, 2024 • 43 facebook/magnet-small-10secs Text-to-Audio • Updated Jan 16, 2024 • 740 • 25 facebook/magnet-medium-10secs Text-to-Audio • Updated Jan 16, 2024 • 268 • 9 facebook/magnet-small-30secs Text-to-Audio • Updated Jan 16, 2024 • 141 • 8
Masked Audio Generation using a Single Non-Autoregressive Transformer Paper • 2401.04577 • Published Jan 9, 2024 • 43
SeamlessM4T SeamlessM4T is designed to provide high quality translation, allowing people from different linguistic communities to communicate effortlessly. Runtime error 951 951 Seamless M4T 📞 facebook/hf-seamless-m4t-large Text-to-Speech • Updated Dec 8, 2023 • 1.32k • 58 facebook/hf-seamless-m4t-medium Text-to-Speech • Updated Dec 8, 2023 • 12.3k • 31 facebook/seamless-m4t-large Automatic Speech Recognition • Updated Dec 14, 2023 • 512
XLS-R First release checkpoints for XLS-R, a large-scale model for cross-lingual speech representation learning based on wav2vec 2.0. facebook/wav2vec2-xls-r-300m Updated Aug 10, 2022 • 96.7k • 100 facebook/wav2vec2-xls-r-1b Updated Aug 10, 2022 • 20.3k • 29 facebook/wav2vec2-xls-r-2b Updated Aug 10, 2022 • 4.61k • 41 facebook/wav2vec2-xls-r-300m-en-to-15 Automatic Speech Recognition • Updated Jan 26, 2023 • 18 • 6
VoxPopuli A collection of open-source artefacts (datasets + checkpoints) from the first VoxPopuli release. facebook/voxpopuli Updated Oct 14, 2022 • 7.15k • 131 VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation Paper • 2101.00390 • Published Jan 2, 2021 • 1 facebook/wav2vec2-base-100k-voxpopuli Automatic Speech Recognition • Updated Nov 5, 2021 • 42 • 4 facebook/wav2vec2-base-10k-voxpopuli-ft-cs Automatic Speech Recognition • Updated Jul 6, 2021 • 10
VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation Paper • 2101.00390 • Published Jan 2, 2021 • 1
HuBERT A collection of checkpoints from the HuBERT release, a speech encoder that learns powerful representations from unlabelled audio data. HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units Paper • 2106.07447 • Published Jun 14, 2021 • 4 facebook/hubert-base-ls960 Feature Extraction • Updated Nov 5, 2021 • 594k • • 64 facebook/hubert-large-ll60k Feature Extraction • Updated Nov 5, 2021 • 39.5k • • 30 facebook/hubert-large-ls960-ft Automatic Speech Recognition • Updated May 24, 2022 • 291k • 73
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units Paper • 2106.07447 • Published Jun 14, 2021 • 4
DINOv2 DINOv2: foundation models producing robust visual features suitable for image-level and pixel-level visual tasks - https://arxiv.org/abs/2304.07193 facebook/dinov2-small Image Feature Extraction • 0.0B • Updated Sep 6, 2023 • 1.37M • 45 facebook/dinov2-base Image Feature Extraction • 0.1B • Updated Jan 17, 2024 • 2.21M • 149 facebook/dinov2-large Image Feature Extraction • 0.3B • Updated Sep 6, 2023 • 332k • 93 facebook/dinov2-giant Image Feature Extraction • 1B • Updated Sep 6, 2023 • 361k • 50
LLM Compiler Meta LLM Compiler is a state-of-the-art LLM that builds upon Code Llama with improved performance for code optimization and compiler reasoning. facebook/llm-compiler-7b Text Generation • Updated Jun 27, 2024 • 38 • 136 facebook/llm-compiler-7b-ftd Text Generation • Updated Jun 27, 2024 • 19 • 27 facebook/llm-compiler-13b Text Generation • Updated Jun 27, 2024 • 7 • 86 facebook/llm-compiler-13b-ftd Text Generation • Updated Jun 27, 2024 • 242 • 56
Sapiens Foundation models for human tasks. Code: https://github.com/facebookresearch/sapiens Sapiens: Foundation for Human Vision Models Paper • 2408.12569 • Published Aug 22, 2024 • 94 facebook/sapiens Updated Sep 20, 2024 • 241 Build error 58 58 Sapiens Pose 📊 Detect and estimate poses in images Runtime error 121 121 Sapiens Segmentation 🌍 Segment body parts in images
FAIR's LayerSkip Llama models facebook/layerskip-llama2-7B Text Generation • 7B • Updated Oct 19, 2024 • 383 • 15 facebook/layerskip-llama2-13B Text Generation • 13B • Updated Oct 19, 2024 • 365 • 5 facebook/layerskip-codellama-7B Text Generation • 7B • Updated Oct 19, 2024 • 166 • 6 facebook/layerskip-codellama-34B Text Generation • 34B • Updated Oct 19, 2024 • 689 • 4
MobileLLM-R1 MobileLLM-R1, a series of sub-billion parameter reasoning models facebook/MobileLLM-R1-950M Text Generation • 0.9B • Updated about 20 hours ago • 4.97k • 327 facebook/MobileLLM-R1-360M Text Generation • 0.4B • Updated about 20 hours ago • 642 • 15 facebook/MobileLLM-R1-140M Text Generation • 0.1B • Updated about 20 hours ago • 1.16k • 27 facebook/MobileLLM-R1-950M-base Text Generation • 0.9B • Updated about 20 hours ago • 554 • 13
DINOv3 DINOv3: foundation models producing excellent dense features, outperforming SotA w/o fine-tuning - https://arxiv.org/abs/2508.10104 facebook/dinov3-vit7b16-pretrain-lvd1689m Image Feature Extraction • 7B • Updated Aug 19 • 30.4k • 151 facebook/dinov3-vits16-pretrain-lvd1689m Image Feature Extraction • 0.0B • Updated Aug 19 • 107k • 29 facebook/dinov3-convnext-small-pretrain-lvd1689m Image Feature Extraction • 0.0B • Updated Aug 19 • 215k • 20 facebook/dinov3-vitb16-pretrain-lvd1689m Image Feature Extraction • 0.1B • Updated Aug 19 • 107k • 63
facebook/dinov3-vit7b16-pretrain-lvd1689m Image Feature Extraction • 7B • Updated Aug 19 • 30.4k • 151
facebook/dinov3-vits16-pretrain-lvd1689m Image Feature Extraction • 0.0B • Updated Aug 19 • 107k • 29
facebook/dinov3-convnext-small-pretrain-lvd1689m Image Feature Extraction • 0.0B • Updated Aug 19 • 215k • 20
facebook/dinov3-vitb16-pretrain-lvd1689m Image Feature Extraction • 0.1B • Updated Aug 19 • 107k • 63
Physics of Language Models: Part 4.2 facebook/PhysicsLM4.2__LlamaCanon-1B-Nemo-1T-lr0.002 Updated Jul 29 • 3 facebook/PhysicsLM4.2__LlamaCanon-1B-Nemo-1T-lr0.003 Updated Jul 29 • 1 facebook/PhysicsLM4.2__LlamaCanon-1B-Nemo-2T-lr0.003 Updated Jul 29 • 2 • 2 facebook/PhysicsLM4.2__LlamaCanon-1B-Nemo-2T-lr0.005 Updated Jul 29 • 17 • 3
Meta CLIP 1/2 Scaling CLIP data with transparent training distribution from an end-to-end pipeline. facebook/metaclip-h14-fullcc2.5b Zero-Shot Image Classification • 1.0B • Updated Jan 11, 2024 • 39.7k • 44 facebook/metaclip-l14-fullcc2.5b Zero-Shot Image Classification • Updated Oct 14, 2023 • 4.01k • 6 facebook/metaclip-b16-fullcc2.5b Zero-Shot Image Classification • Updated Oct 14, 2023 • 13.6k • 10 facebook/metaclip-b32-fullcc2.5b Zero-Shot Image Classification • Updated Oct 8, 2023 • 411 • 8
facebook/metaclip-h14-fullcc2.5b Zero-Shot Image Classification • 1.0B • Updated Jan 11, 2024 • 39.7k • 44
V-JEPA 2 A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann facebook/vjepa2-vitl-fpc64-256 Video Classification • 0.3B • Updated Aug 11 • 93.8k • 153 facebook/vjepa2-vith-fpc64-256 Video Classification • 0.7B • Updated Aug 11 • 2.29k • 12 facebook/vjepa2-vitg-fpc64-256 Video Classification • 1B • Updated Aug 11 • 4.86k • 18 facebook/vjepa2-vitg-fpc64-384 Video Classification • 1B • Updated Aug 11 • 8.07k • 30
Web-SSL facebook/webssl-dino300m-full2b-224 Image Feature Extraction • 0.3B • Updated Apr 24 • 1.27k • 10 facebook/webssl-dino1b-full2b-224 Image Feature Extraction • 1B • Updated Apr 24 • 5.89k • 3 facebook/webssl-dino2b-full2b-224 Image Feature Extraction • 2B • Updated Apr 24 • 587 facebook/webssl-dino3b-full2b-224 Image Feature Extraction • 3B • Updated Apr 24 • 645
blt facebook/blt Updated Apr 30 • 91 • 73 facebook/blt-1b 5B • Updated May 1 • 378 • 19 facebook/blt-7b Updated May 1 • 52 • 61 Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published Dec 13, 2024 • 108
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published Dec 13, 2024 • 108
Perception LM facebook/Perception-LM-1B Image-Text-to-Text • 2B • Updated Aug 13 • 1.72k • 36 facebook/Perception-LM-3B Image-Text-to-Text • 4B • Updated Aug 13 • 24.8k • 19 facebook/Perception-LM-8B Image-Text-to-Text • 10B • Updated Jul 14 • 14k • 54 facebook/PLM-VideoBench Viewer • Updated May 21 • 44k • 539 • 11
Perception Encoder facebook/PE-Core-L14-336 Zero-Shot Image Classification • Updated Apr 30 • 89.2k • 46 facebook/PE-Core-G14-448 Zero-Shot Image Classification • Updated Apr 30 • 25.3k • 15 facebook/PE-Lang-L14-448 Image Feature Extraction • Updated Apr 30 • 368 • 7 facebook/PE-Lang-G14-448 Image Feature Extraction • Updated Apr 30 • 305 • 13
FAIR Chemistry facebook/OMAT24 Updated about 5 hours ago • 83 facebook/OMAT24 Preview • Updated 4 days ago • 288 • 59 facebook/OMol25 Updated 8 days ago • 152 facebook/UMA Updated Jul 2 • 145
DRAMA A collection of small (sub-1B) multilingual dense retrievers that generalize well across a number of tasks and languages. facebook/drama-base Sentence Similarity • 0.2B • Updated Jul 21 • 208 • 20 facebook/drama-large Sentence Similarity • 0.4B • Updated Mar 4 • 101 • 7 facebook/drama-1b Sentence Similarity • 1B • Updated Mar 4 • 762 • 12
Meta Motivo A first-of-its-kind behavioral foundation model to control a virtual physics-based humanoid agent for a wide range of whole-body tasks. facebook/metamotivo-S-1 0.0B • Updated Dec 12, 2024 • 2.69k • 9 facebook/metamotivo-S-2 0.0B • Updated Dec 12, 2024 • 6 • 2 facebook/metamotivo-S-3 0.0B • Updated Dec 12, 2024 • 5 • 2 facebook/metamotivo-S-4 0.0B • Updated Dec 12, 2024 • 5 • 2
MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases (ICML 2024) https://arxiv.org/abs/2402.14905 facebook/MobileLLM-R1-950M Text Generation • 0.9B • Updated about 20 hours ago • 4.97k • 327 facebook/MobileLLM-R1-360M Text Generation • 0.4B • Updated about 20 hours ago • 642 • 15 facebook/MobileLLM-R1-140M Text Generation • 0.1B • Updated about 20 hours ago • 1.16k • 27 facebook/MobileLLM-R1-950M-base Text Generation • 0.9B • Updated about 20 hours ago • 554 • 13
Sparsh Models and datasets for Sparsh: Self-supervised touch representations for vision-based tactile sensing facebook/sparsh-dino-base Updated Oct 21, 2024 • 5 facebook/sparsh-dino-small Updated Oct 21, 2024 • 1 facebook/sparsh-mae-base Updated Oct 21, 2024 • 1 facebook/sparsh-mae-small Updated Oct 21, 2024 • 1
LayerSkip Models continually pretrained using LayerSkip - https://arxiv.org/abs/2404.16710 LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding Paper • 2404.16710 • Published Apr 25, 2024 • 79 facebook/layerskip-llama2-7B Text Generation • 7B • Updated Oct 19, 2024 • 383 • 15 facebook/layerskip-llama2-13B Text Generation • 13B • Updated Oct 19, 2024 • 365 • 5 facebook/layerskip-llama2-70B Text Generation • 69B • Updated Nov 3, 2024 • 136 • 5
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding Paper • 2404.16710 • Published Apr 25, 2024 • 79
MelodyFlow MelodyFlow: High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching Paper • 2407.03648 • Published Jul 4, 2024 • 19 facebook/melodyflow-t24-30secs Updated Oct 23, 2024 • 27 Running on Zero 121 121 MelodyFlow 🎵 Generate music from text descriptions
High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching Paper • 2407.03648 • Published Jul 4, 2024 • 19
Seamless Communication A significant step towards removing language barriers through expressive, fast and high-quality AI translation. Seamless: Multilingual Expressive and Streaming Speech Translation Paper • 2312.05187 • Published Dec 8, 2023 • 14 facebook/seamless-m4t-v2-large Automatic Speech Recognition • 2B • Updated Jan 4, 2024 • 53.9k • 893 Runtime error 516 516 Seamless M4T v2 📞 facebook/seamless-expressive Text-to-Speech • Updated Jan 4, 2024 • 186
Seamless: Multilingual Expressive and Streaming Speech Translation Paper • 2312.05187 • Published Dec 8, 2023 • 14
MAGNeT Masked Audio Generation using a Single Non-Autoregressive Transformer Masked Audio Generation using a Single Non-Autoregressive Transformer Paper • 2401.04577 • Published Jan 9, 2024 • 43 facebook/magnet-small-10secs Text-to-Audio • Updated Jan 16, 2024 • 740 • 25 facebook/magnet-medium-10secs Text-to-Audio • Updated Jan 16, 2024 • 268 • 9 facebook/magnet-small-30secs Text-to-Audio • Updated Jan 16, 2024 • 141 • 8
Masked Audio Generation using a Single Non-Autoregressive Transformer Paper • 2401.04577 • Published Jan 9, 2024 • 43
Wav2Vec 2.0 A collection for the first release of Wav2Vec 2.0, a speech encoder that learns powerful representations from unlabelled audio data. facebook/wav2vec2-large-960h-lv60-self Automatic Speech Recognition • Updated May 23, 2022 • 40.7k • 151 facebook/wav2vec2-large-960h Automatic Speech Recognition • Updated Apr 5, 2022 • 64.6k • 31 facebook/wav2vec2-base-960h Automatic Speech Recognition • 0.1B • Updated Nov 14, 2022 • 2.33M • 380 facebook/wav2vec2-base-100h Automatic Speech Recognition • Updated May 27, 2022 • 1.17k • 6
facebook/wav2vec2-large-960h-lv60-self Automatic Speech Recognition • Updated May 23, 2022 • 40.7k • 151
SeamlessM4T SeamlessM4T is designed to provide high quality translation, allowing people from different linguistic communities to communicate effortlessly. Runtime error 951 951 Seamless M4T 📞 facebook/hf-seamless-m4t-large Text-to-Speech • Updated Dec 8, 2023 • 1.32k • 58 facebook/hf-seamless-m4t-medium Text-to-Speech • Updated Dec 8, 2023 • 12.3k • 31 facebook/seamless-m4t-large Automatic Speech Recognition • Updated Dec 14, 2023 • 512
XLSR A collection of multilingual Wav2Vec 2.0 checkpoints pre-trained on 53 languages and fine-tuned for CTC speech recognition. facebook/wav2vec2-large-xlsr-53 Updated Mar 18, 2022 • 148k • 146 facebook/wav2vec2-xlsr-53-espeak-cv-ft Automatic Speech Recognition • Updated Dec 10, 2021 • 463k • 39 facebook/wav2vec2-large-xlsr-53-dutch Automatic Speech Recognition • Updated Jul 6, 2021 • 1.35k • 3 facebook/wav2vec2-large-xlsr-53-french Automatic Speech Recognition • Updated Jul 6, 2021 • 3.36k • 13
facebook/wav2vec2-xlsr-53-espeak-cv-ft Automatic Speech Recognition • Updated Dec 10, 2021 • 463k • 39
facebook/wav2vec2-large-xlsr-53-french Automatic Speech Recognition • Updated Jul 6, 2021 • 3.36k • 13
XLS-R First release checkpoints for XLS-R, a large-scale model for cross-lingual speech representation learning based on wav2vec 2.0. facebook/wav2vec2-xls-r-300m Updated Aug 10, 2022 • 96.7k • 100 facebook/wav2vec2-xls-r-1b Updated Aug 10, 2022 • 20.3k • 29 facebook/wav2vec2-xls-r-2b Updated Aug 10, 2022 • 4.61k • 41 facebook/wav2vec2-xls-r-300m-en-to-15 Automatic Speech Recognition • Updated Jan 26, 2023 • 18 • 6
Robust Wav2Vec 2.0 A collection of "robust" Wav2Vec 2.0 checkpoints pre-trained on datasets from multiple domains. facebook/wav2vec2-large-robust Updated Nov 5, 2021 • 1.41k • 37 facebook/wav2vec2-large-robust-ft-libri-960h Automatic Speech Recognition • 0.3B • Updated Jun 23, 2023 • 286k • 15 facebook/wav2vec2-large-robust-ft-swbd-300h Automatic Speech Recognition • Updated Apr 5, 2022 • 8.72k • 20 Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training Paper • 2104.01027 • Published Apr 2, 2021 • 1
facebook/wav2vec2-large-robust-ft-libri-960h Automatic Speech Recognition • 0.3B • Updated Jun 23, 2023 • 286k • 15
facebook/wav2vec2-large-robust-ft-swbd-300h Automatic Speech Recognition • Updated Apr 5, 2022 • 8.72k • 20
Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training Paper • 2104.01027 • Published Apr 2, 2021 • 1
VoxPopuli A collection of open-source artefacts (datasets + checkpoints) from the first VoxPopuli release. facebook/voxpopuli Updated Oct 14, 2022 • 7.15k • 131 VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation Paper • 2101.00390 • Published Jan 2, 2021 • 1 facebook/wav2vec2-base-100k-voxpopuli Automatic Speech Recognition • Updated Nov 5, 2021 • 42 • 4 facebook/wav2vec2-base-10k-voxpopuli-ft-cs Automatic Speech Recognition • Updated Jul 6, 2021 • 10
VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation Paper • 2101.00390 • Published Jan 2, 2021 • 1
VoxPopuli v2 A collection of checkpoints from the second VoxPopuli release. VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation Paper • 2101.00390 • Published Jan 2, 2021 • 1 facebook/wav2vec2-base-bg-voxpopuli-v2 Automatic Speech Recognition • Updated Feb 27, 2022 • 32 facebook/wav2vec2-base-cs-voxpopuli-v2 Automatic Speech Recognition • Updated Feb 27, 2022 • 7 • 1 facebook/wav2vec2-base-da-voxpopuli-v2 Automatic Speech Recognition • Updated Feb 27, 2022 • 7
VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation Paper • 2101.00390 • Published Jan 2, 2021 • 1
HuBERT A collection of checkpoints from the HuBERT release, a speech encoder that learns powerful representations from unlabelled audio data. HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units Paper • 2106.07447 • Published Jun 14, 2021 • 4 facebook/hubert-base-ls960 Feature Extraction • Updated Nov 5, 2021 • 594k • • 64 facebook/hubert-large-ll60k Feature Extraction • Updated Nov 5, 2021 • 39.5k • • 30 facebook/hubert-large-ls960-ft Automatic Speech Recognition • Updated May 24, 2022 • 291k • 73
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units Paper • 2106.07447 • Published Jun 14, 2021 • 4
Fairseq S^2 TTS Text-to-speech models from fairseq s^2 facebook/fastspeech2-en-ljspeech Text-to-Speech • Updated Jan 28, 2022 • 46 • 273 facebook/fastspeech2-en-200_speaker-cv4 Text-to-Speech • Updated Jan 28, 2022 • 9 • 6 facebook/tts_transformer-ar-cv7 Text-to-Speech • Updated Jan 28, 2022 • 8 • 8 facebook/tts_transformer-vi-cv7 Text-to-Speech • Updated Jan 28, 2022 • 10 • 11
DINOv2 DINOv2: foundation models producing robust visual features suitable for image-level and pixel-level visual tasks - https://arxiv.org/abs/2304.07193 facebook/dinov2-small Image Feature Extraction • 0.0B • Updated Sep 6, 2023 • 1.37M • 45 facebook/dinov2-base Image Feature Extraction • 0.1B • Updated Jan 17, 2024 • 2.21M • 149 facebook/dinov2-large Image Feature Extraction • 0.3B • Updated Sep 6, 2023 • 332k • 93 facebook/dinov2-giant Image Feature Extraction • 1B • Updated Sep 6, 2023 • 361k • 50
MusicGen Stereo A collection of stereo music generation models as part of the v2 MusicGen release. facebook/musicgen-stereo-small Text-to-Audio • 0.6B • Updated Mar 6, 2024 • 1.94k • 34 facebook/musicgen-stereo-medium Text-to-Audio • 2B • Updated Mar 6, 2024 • 380 • 32 facebook/musicgen-stereo-large Text-to-Audio • 3B • Updated Mar 6, 2024 • 651 • 81 facebook/musicgen-stereo-melody-large Text-to-Audio • 3B • Updated Apr 24, 2024 • 264 • 60
LLM Compiler Meta LLM Compiler is a state-of-the-art LLM that builds upon Code Llama with improved performance for code optimization and compiler reasoning. facebook/llm-compiler-7b Text Generation • Updated Jun 27, 2024 • 38 • 136 facebook/llm-compiler-7b-ftd Text Generation • Updated Jun 27, 2024 • 19 • 27 facebook/llm-compiler-13b Text Generation • Updated Jun 27, 2024 • 7 • 86 facebook/llm-compiler-13b-ftd Text Generation • Updated Jun 27, 2024 • 242 • 56
Chameleon Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR. facebook/chameleon-7b Image-Text-to-Text • 7B • Updated Jul 23, 2024 • 82k • 189 facebook/chameleon-30b Image-Text-to-Text • 34B • Updated Jul 30, 2024 • 34 • 88
Sapiens Foundation models for human tasks. Code: https://github.com/facebookresearch/sapiens Sapiens: Foundation for Human Vision Models Paper • 2408.12569 • Published Aug 22, 2024 • 94 facebook/sapiens Updated Sep 20, 2024 • 241 Build error 58 58 Sapiens Pose 📊 Detect and estimate poses in images Runtime error 121 121 Sapiens Segmentation 🌍 Segment body parts in images
OPT OPT (Open Pretrained Transformer) is a series of open-sourced large causal language models which perform similar in performance to GPT3. facebook/opt-125m Text Generation • Updated Sep 15, 2023 • 9.74M • 218 facebook/opt-350m Text Generation • Updated Sep 15, 2023 • 157k • 148 facebook/opt-1.3b Text Generation • Updated Sep 15, 2023 • 181k • 178 facebook/opt-2.7b Text Generation • Updated Sep 15, 2023 • 59.6k • 86
FAIR's LayerSkip Llama models facebook/layerskip-llama2-7B Text Generation • 7B • Updated Oct 19, 2024 • 383 • 15 facebook/layerskip-llama2-13B Text Generation • 13B • Updated Oct 19, 2024 • 365 • 5 facebook/layerskip-codellama-7B Text Generation • 7B • Updated Oct 19, 2024 • 166 • 6 facebook/layerskip-codellama-34B Text Generation • 34B • Updated Oct 19, 2024 • 689 • 4