Zero-Shot Image Classification
OpenCLIP
Safetensors
mehdidc commited on
Commit
4afec35
·
verified ·
1 Parent(s): 7842cb0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -30
README.md CHANGED
@@ -18,9 +18,9 @@ pipeline_tag: zero-shot-image-classification
18
  2. [Uses](#uses)
19
  3. [Training Details](#training-details)
20
  4. [Evaluation](#evaluation)
21
- 5. [Acknowledgements](#acknowledgements)
22
- 6. [Citation](#citation)
23
- 7. [How To Get Started With the Model](#how-to-get-started-with-the-model)
24
 
25
 
26
  # Model Details
@@ -118,6 +118,37 @@ The testing is performed on a suite of 38 datasets. See our paper for more detai
118
 
119
  The model achieves a 72.7% zero-shot top-1 accuracy on ImageNet-1k, 64.4% image retrieval recall@5 and 80.7% text retrieval recall@5 on COCO captions.
120
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
121
 
122
  # Citation
123
 
@@ -190,30 +221,3 @@ CLIP benchmark software
190
  },
191
  }
192
 
193
- # How to Get Started with the Model
194
-
195
- Zero-shot classification example:
196
-
197
- ```python
198
- import torch
199
- from PIL import Image
200
- import open_clip
201
-
202
- model, _, preprocess = open_clip.create_model_and_transforms('hf-hub:laion/CLIP-ViT-B-32-256x256-DataComp-s34B-b86K')
203
- model.eval() # model in train mode by default, impacts some models with BatchNorm or stochastic depth active
204
- tokenizer = open_clip.get_tokenizer('hf-hub:laion/CLIP-ViT-B-32-256x256-DataComp-s34B-b86K')
205
-
206
- image = preprocess(Image.open("docs/CLIP.png")).unsqueeze(0)
207
- text = tokenizer(["a diagram", "a dog", "a cat"])
208
-
209
- with torch.no_grad(), torch.autocast("cuda"):
210
- image_features = model.encode_image(image)
211
- text_features = model.encode_text(text)
212
- image_features /= image_features.norm(dim=-1, keepdim=True)
213
- text_features /= text_features.norm(dim=-1, keepdim=True)
214
-
215
- text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)
216
-
217
- print("Label probs:", text_probs) # prints: [[1., 0., 0.]]
218
-
219
- ```
 
18
  2. [Uses](#uses)
19
  3. [Training Details](#training-details)
20
  4. [Evaluation](#evaluation)
21
+ 5. [How To Get Started With the Model](#how-to-get-started-with-the-model)
22
+ 6. [Acknowledgements](#acknowledgements)
23
+ 7. [Citation](#citation)
24
 
25
 
26
  # Model Details
 
118
 
119
  The model achieves a 72.7% zero-shot top-1 accuracy on ImageNet-1k, 64.4% image retrieval recall@5 and 80.7% text retrieval recall@5 on COCO captions.
120
 
121
+ # How to Get Started with the Model
122
+
123
+ Zero-shot classification example:
124
+
125
+ ```python
126
+ import torch
127
+ from PIL import Image
128
+ import open_clip
129
+
130
+ model, _, preprocess = open_clip.create_model_and_transforms('hf-hub:laion/CLIP-ViT-B-32-256x256-DataComp-s34B-b86K')
131
+ model.eval() # model in train mode by default, impacts some models with BatchNorm or stochastic depth active
132
+ tokenizer = open_clip.get_tokenizer('hf-hub:laion/CLIP-ViT-B-32-256x256-DataComp-s34B-b86K')
133
+
134
+ image = preprocess(Image.open("docs/CLIP.png")).unsqueeze(0)
135
+ text = tokenizer(["a diagram", "a dog", "a cat"])
136
+
137
+ with torch.no_grad(), torch.autocast("cuda"):
138
+ image_features = model.encode_image(image)
139
+ text_features = model.encode_text(text)
140
+ image_features /= image_features.norm(dim=-1, keepdim=True)
141
+ text_features /= text_features.norm(dim=-1, keepdim=True)
142
+
143
+ text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)
144
+
145
+ print("Label probs:", text_probs) # prints: [[1., 0., 0.]]
146
+ ```
147
+
148
+ # Acknowledgements
149
+
150
+ We gratefully acknowledge the computing time granted by the John von Neumann Institute for Computing (NIC)
151
+ and provided on the supercomputer JURECA at Jülich Supercomputing Centre (JSC).
152
 
153
  # Citation
154
 
 
221
  },
222
  }
223