sapiens
English

Model Details

Sapiens, a family of models for four fundamental human-centric vision tasks - 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction. Our models natively support 1K high-resolution inference and are extremely easy to adapt for individual tasks by simply fine-tuning models pretrained on over 300 million in-the-wild human images. The resulting models exhibit remarkable generalization to in-the-wild data, even when labeled data is scarce or entirely synthetic. Our simple model design also brings scalability - model performance across tasks improves as we scale the parameters from 0.3 to 2 billion. Sapiens consistently surpasses existing baselines across various human-centric benchmarks.

Model Description

  • Developed by: Meta
  • Model type: Vision Transformers
  • License: Creative Commons Attribution-NonCommercial 4.0

More Resources

Uses

  • pose estimation (keypoints 17, keypoints 133, keypoints 308)
  • body-part segmentation (28 classes)
  • depth estimation
  • surface normal estimation

Model Zoo

Note: This repository does not host any checkpoints but contains links to all the model repositories.

We provide checkpoints in three formats:

  • original: weights can be finetuned for your use case along with inference.
  • torchscript: (inference only) weights ported to torchscript.
  • bfloat16: (inference only) for large scale processing, weights ported to bfloat16 (A100 gpu only + pytorch-2.3).
Model Name Original TorchScript BFloat16
sapiens-pretrain-0.3b link link link
sapiens-pretrain-0.6b link link link
sapiens-pretrain-1b link link link
sapiens-pretrain-2b link link link

sapiens-pose-0.3b link link link
sapiens-pose-0.6b link link link
sapiens-pose-1b link link link

sapiens-seg-0.3b link link link
sapiens-seg-0.6b link link link
sapiens-seg-1b link link link

sapiens-depth-0.3b link link link
sapiens-depth-0.6b link link link
sapiens-depth-1b link link link
sapiens-depth-2b link link link

sapiens-normal-0.3b link link link
sapiens-normal-0.6b link link link
sapiens-normal-1b link link link
sapiens-normal-2b link link link

Helper models for bounding box detection or background removal.

Model Name Original TorchScript BFloat16
sapiens-pose-bbox-detector link - -
sapiens-seg-foreground-1b - link -

Other finetuned models (pose-133 and pose-17): here

Downloads last month
10
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Spaces using facebook/sapiens 4

Collection including facebook/sapiens