A CLIP (Contrastive Language-Image Pre-training) model trained from scratch on EntityNet-33M.
See the project page for the paper, code, usage examples, metrics, etc.
The model has seen ~0.6B images at a batch size of 8k.
- Downloads last month
- 7
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support