Web-SSL DINO ViT-1B: 2B MetaCLIP data, 224 Resolution

A 1 billion parameter Vision Transformer (ViT) trained with DINOv2 self-supervised learning on web-scale image data without language supervision. Introduced in "Scaling Language-Free Visual Representation Learning" (Fan et al., 2025).

Model Details

  • Architecture: ViT (1536 width, 40 depth, 24 heads)
  • Parameters: 1B
  • Resolution: 224×224 pixels
  • Training: Self-supervised Web-DINO on 2B image samples from MetaCLIP web data

Model Descriptions

Web-SSL DINO 1B is a 1 billion parameter Vision Transformer model trained using self-supervised learning on 2 billion web images without language supervision. This model demonstrates that pure visual learning, when scaled appropriately, can match or exceed the performance of language-supervised models like CLIP across various vision tasks.

WebSSL Model Overview

Usage

from transformers import AutoImageProcessor, Dinov2Model
import torch
from PIL import Image

processor = AutoImageProcessor.from_pretrained('facebook/webssl-dino1b-full2b-224')
# 'eager' and 'sdpa' attn_implementation supported
model = Dinov2Model.from_pretrained('facebook/webssl-dino1b-full2b-224')

# Process an image
image = Image.open('path/to/image.jpg')
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)

cls_features = outputs.last_hidden_state[:, 0]  # CLS token features
patch_features = outputs.last_hidden_state[:, 1:] # patch-wise token features

Citation

@article{fan2025scaling,
  title={Scaling Language-Free Visual Representation Learning}, 
  author={David Fan and Shengbang Tong and Jiachen Zhu and Koustuv Sinha and Zhuang Liu and Xinlei Chen and Michael Rabbat and Nicolas Ballas and Yann LeCun and Amir Bar and Saining Xie},
  year={2025},
  eprint={2504.01017},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}
Downloads last month
1,124
Safetensors
Model size
1.13B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including facebook/webssl-dino1b-full2b-224