Image Feature Extraction
Birder
PyTorch
hassonofer commited on
Commit
5783d46
·
verified ·
1 Parent(s): c325ccf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +135 -3
README.md CHANGED
@@ -1,3 +1,135 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - image-feature-extraction
4
+ - birder
5
+ - pytorch
6
+ library_name: birder
7
+ license: mit
8
+ ---
9
+
10
+ # Model Card for sscd_resnext_101_c1
11
+
12
+ A SSCD ResNeXt model designed to be used for image copy detection, converted to the Birder format for image feature extraction. This version retains the original model weights. The model produces 1024-dimensional L2 normalized descriptors for each input image.
13
+
14
+ The similarity between two images, represented by their descriptors a and b, can be effectively measured using descriptor cosine similarity `a.dot(b)`, where higher values indicate greater similarity.
15
+ Alternatively, Euclidean distance `torch.linalg.vector_norm(a-b)` can be used, with lower values indicating greater similarity.
16
+ For reference, descriptor cosine similarity greater than 0.75 indicates copies with 90% precision.
17
+
18
+ For optimal performance, particularly when sample images from the target distribution are available, additional descriptor post-processing is recommended.
19
+ This includes techniques such as centering (subtracting the mean) followed by L2 normalization, or whitening followed by L2 normalization, both of which can enhance accuracy.
20
+ Furthermore, applying score normalization can lead to more consistent similarity measurements and improve global accuracy metrics, although it does not impact ranking metrics.
21
+
22
+ For further information see: <https://github.com/facebookresearch/sscd-copy-detection>
23
+
24
+ ## Model Details
25
+
26
+ - **Model Type:** Image copy detection
27
+ - **Model Stats:**
28
+ - Params (M): 24.6
29
+ - Input image size: 320 x 320
30
+ - **Dataset:** DISC21: Dataset for the Image Similarity Challenge 2021
31
+
32
+ - **Papers:**
33
+ - Aggregated Residual Transformations for Deep Neural Networks: <https://arxiv.org/abs/1611.05431>
34
+ - A Self-Supervised Descriptor for Image Copy Detection: <https://arxiv.org/abs/2202.10261>
35
+
36
+ ## Model Usage
37
+
38
+ ### Image Copy Detection
39
+
40
+ ```python
41
+ import torch
42
+ import torch.nn.functional as F
43
+ from PIL import Image
44
+
45
+ import birder
46
+ from birder.inference.classification import infer_image
47
+
48
+ (net, model_info) = birder.load_pretrained_model("sscd_resnext_101_c1", file_format="pts", inference=True)
49
+
50
+ # Get the image size the model was trained on
51
+ size = birder.get_size_from_signature(model_info.signature)
52
+
53
+ # Create an inference transform
54
+ transform = birder.classification_transform(size, model_info.rgb_stats)
55
+
56
+ image1 = Image.open("path/to/image1.jpeg")
57
+ image2 = Image.open("path/to/image2.jpeg")
58
+ out1 = net(transform(image1).unsqueeze(dim=0))
59
+ out2 = net(transform(image2).unsqueeze(dim=0))
60
+ # Both out1 and out2 have torch.Size([1, 512])
61
+
62
+ # Calculate cosine similarity (higher = more similar, range: -1 to 1)
63
+ F.cosine_similarity(out1, out2, dim=1)
64
+
65
+ # Calculate Euclidean distance (lower = more similar)
66
+ torch.linalg.vector_norm(out1 - out2, dim=1)
67
+ ```
68
+
69
+ ### Image Embeddings
70
+
71
+ ```python
72
+ import birder
73
+ from birder.inference.classification import infer_image
74
+
75
+ (net, model_info) = birder.load_pretrained_model("sscd_resnext_101_c1", inference=True)
76
+
77
+ # Get the image size the model was trained on
78
+ size = birder.get_size_from_signature(model_info.signature)
79
+
80
+ # Create an inference transform
81
+ transform = birder.classification_transform(size, model_info.rgb_stats)
82
+
83
+ image = "path/to/image.jpeg" # or a PIL image
84
+ (out, embedding) = infer_image(net, image, transform, return_embedding=True)
85
+ # embedding is a NumPy array with shape of (1, 2048)
86
+ ```
87
+
88
+ ### Detection Feature Map
89
+
90
+ ```python
91
+ from PIL import Image
92
+ import birder
93
+
94
+ (net, model_info) = birder.load_pretrained_model("sscd_resnext_101_c1", inference=True)
95
+
96
+ # Get the image size the model was trained on
97
+ size = birder.get_size_from_signature(model_info.signature)
98
+
99
+ # Create an inference transform
100
+ transform = birder.classification_transform(size, model_info.rgb_stats)
101
+
102
+ image = Image.open("path/to/image.jpeg")
103
+ features = net.detection_features(transform(image).unsqueeze(0))
104
+ # features is a dict (stage name -> torch.Tensor)
105
+ print([(k, v.size()) for k, v in features.items()])
106
+ # Output example:
107
+ # [('stage1', torch.Size([1, 256, 80, 80])),
108
+ # ('stage2', torch.Size([1, 512, 40, 40])),
109
+ # ('stage3', torch.Size([1, 1024, 20, 20])),
110
+ # ('stage4', torch.Size([1, 2048, 10, 10]))]
111
+ ```
112
+
113
+ ## Citation
114
+
115
+ ```bibtex
116
+ @misc{xie2017aggregatedresidualtransformationsdeep,
117
+ title={Aggregated Residual Transformations for Deep Neural Networks},
118
+ author={Saining Xie and Ross Girshick and Piotr Dollár and Zhuowen Tu and Kaiming He},
119
+ year={2017},
120
+ eprint={1611.05431},
121
+ archivePrefix={arXiv},
122
+ primaryClass={cs.CV},
123
+ url={https://arxiv.org/abs/1611.05431},
124
+ }
125
+
126
+ @misc{pizzi2022selfsuperviseddescriptorimagecopy,
127
+ title={A Self-Supervised Descriptor for Image Copy Detection},
128
+ author={Ed Pizzi and Sreya Dutta Roy and Sugosh Nagavara Ravindra and Priya Goyal and Matthijs Douze},
129
+ year={2022},
130
+ eprint={2202.10261},
131
+ archivePrefix={arXiv},
132
+ primaryClass={cs.CV},
133
+ url={https://arxiv.org/abs/2202.10261},
134
+ }
135
+ ```