Update README.md
Browse files
README.md
CHANGED
@@ -18,9 +18,9 @@ pipeline_tag: zero-shot-image-classification
|
|
18 |
2. [Uses](#uses)
|
19 |
3. [Training Details](#training-details)
|
20 |
4. [Evaluation](#evaluation)
|
21 |
-
5. [
|
22 |
-
6. [
|
23 |
-
7. [
|
24 |
|
25 |
|
26 |
# Model Details
|
@@ -118,6 +118,37 @@ The testing is performed on a suite of 38 datasets. See our paper for more detai
|
|
118 |
|
119 |
The model achieves a 72.7% zero-shot top-1 accuracy on ImageNet-1k, 64.4% image retrieval recall@5 and 80.7% text retrieval recall@5 on COCO captions.
|
120 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
121 |
|
122 |
# Citation
|
123 |
|
@@ -190,30 +221,3 @@ CLIP benchmark software
|
|
190 |
},
|
191 |
}
|
192 |
|
193 |
-
# How to Get Started with the Model
|
194 |
-
|
195 |
-
Zero-shot classification example:
|
196 |
-
|
197 |
-
```python
|
198 |
-
import torch
|
199 |
-
from PIL import Image
|
200 |
-
import open_clip
|
201 |
-
|
202 |
-
model, _, preprocess = open_clip.create_model_and_transforms('hf-hub:laion/CLIP-ViT-B-32-256x256-DataComp-s34B-b86K')
|
203 |
-
model.eval() # model in train mode by default, impacts some models with BatchNorm or stochastic depth active
|
204 |
-
tokenizer = open_clip.get_tokenizer('hf-hub:laion/CLIP-ViT-B-32-256x256-DataComp-s34B-b86K')
|
205 |
-
|
206 |
-
image = preprocess(Image.open("docs/CLIP.png")).unsqueeze(0)
|
207 |
-
text = tokenizer(["a diagram", "a dog", "a cat"])
|
208 |
-
|
209 |
-
with torch.no_grad(), torch.autocast("cuda"):
|
210 |
-
image_features = model.encode_image(image)
|
211 |
-
text_features = model.encode_text(text)
|
212 |
-
image_features /= image_features.norm(dim=-1, keepdim=True)
|
213 |
-
text_features /= text_features.norm(dim=-1, keepdim=True)
|
214 |
-
|
215 |
-
text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)
|
216 |
-
|
217 |
-
print("Label probs:", text_probs) # prints: [[1., 0., 0.]]
|
218 |
-
|
219 |
-
```
|
|
|
18 |
2. [Uses](#uses)
|
19 |
3. [Training Details](#training-details)
|
20 |
4. [Evaluation](#evaluation)
|
21 |
+
5. [How To Get Started With the Model](#how-to-get-started-with-the-model)
|
22 |
+
6. [Acknowledgements](#acknowledgements)
|
23 |
+
7. [Citation](#citation)
|
24 |
|
25 |
|
26 |
# Model Details
|
|
|
118 |
|
119 |
The model achieves a 72.7% zero-shot top-1 accuracy on ImageNet-1k, 64.4% image retrieval recall@5 and 80.7% text retrieval recall@5 on COCO captions.
|
120 |
|
121 |
+
# How to Get Started with the Model
|
122 |
+
|
123 |
+
Zero-shot classification example:
|
124 |
+
|
125 |
+
```python
|
126 |
+
import torch
|
127 |
+
from PIL import Image
|
128 |
+
import open_clip
|
129 |
+
|
130 |
+
model, _, preprocess = open_clip.create_model_and_transforms('hf-hub:laion/CLIP-ViT-B-32-256x256-DataComp-s34B-b86K')
|
131 |
+
model.eval() # model in train mode by default, impacts some models with BatchNorm or stochastic depth active
|
132 |
+
tokenizer = open_clip.get_tokenizer('hf-hub:laion/CLIP-ViT-B-32-256x256-DataComp-s34B-b86K')
|
133 |
+
|
134 |
+
image = preprocess(Image.open("docs/CLIP.png")).unsqueeze(0)
|
135 |
+
text = tokenizer(["a diagram", "a dog", "a cat"])
|
136 |
+
|
137 |
+
with torch.no_grad(), torch.autocast("cuda"):
|
138 |
+
image_features = model.encode_image(image)
|
139 |
+
text_features = model.encode_text(text)
|
140 |
+
image_features /= image_features.norm(dim=-1, keepdim=True)
|
141 |
+
text_features /= text_features.norm(dim=-1, keepdim=True)
|
142 |
+
|
143 |
+
text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)
|
144 |
+
|
145 |
+
print("Label probs:", text_probs) # prints: [[1., 0., 0.]]
|
146 |
+
```
|
147 |
+
|
148 |
+
# Acknowledgements
|
149 |
+
|
150 |
+
We gratefully acknowledge the computing time granted by the John von Neumann Institute for Computing (NIC)
|
151 |
+
and provided on the supercomputer JURECA at Jülich Supercomputing Centre (JSC).
|
152 |
|
153 |
# Citation
|
154 |
|
|
|
221 |
},
|
222 |
}
|
223 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|