A CLIP (Contrastive Language-Image Pre-training) model trained from scratch on EntityNet-33M.

See the project page for the paper, code, usage examples, metrics, etc.

The model has seen ~0.6B images at a batch size of 8k.

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including lmb-freiburg/CLIP-ViT-B-32-EntityNet-33M