Update README.md
Browse files
    	
        README.md
    CHANGED
    
    | @@ -1,3 +1,135 @@ | |
| 1 | 
            -
            ---
         | 
| 2 | 
            -
             | 
| 3 | 
            -
             | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            ---
         | 
| 2 | 
            +
            tags:
         | 
| 3 | 
            +
            - image-feature-extraction
         | 
| 4 | 
            +
            - birder
         | 
| 5 | 
            +
            - pytorch
         | 
| 6 | 
            +
            library_name: birder
         | 
| 7 | 
            +
            license: mit
         | 
| 8 | 
            +
            ---
         | 
| 9 | 
            +
             | 
| 10 | 
            +
            # Model Card for sscd_resnext_101_c1
         | 
| 11 | 
            +
             | 
| 12 | 
            +
            A SSCD ResNeXt model designed to be used for image copy detection, converted to the Birder format for image feature extraction. This version retains the original model weights. The model produces 1024-dimensional L2 normalized descriptors for each input image.
         | 
| 13 | 
            +
             | 
| 14 | 
            +
            The similarity between two images, represented by their descriptors a and b, can be effectively measured using descriptor cosine similarity `a.dot(b)`, where higher values indicate greater similarity.
         | 
| 15 | 
            +
            Alternatively, Euclidean distance `torch.linalg.vector_norm(a-b)` can be used, with lower values indicating greater similarity.
         | 
| 16 | 
            +
            For reference, descriptor cosine similarity greater than 0.75 indicates copies with 90% precision.
         | 
| 17 | 
            +
             | 
| 18 | 
            +
            For optimal performance, particularly when sample images from the target distribution are available, additional descriptor post-processing is recommended.
         | 
| 19 | 
            +
            This includes techniques such as centering (subtracting the mean) followed by L2 normalization, or whitening followed by L2 normalization, both of which can enhance accuracy.
         | 
| 20 | 
            +
            Furthermore, applying score normalization can lead to more consistent similarity measurements and improve global accuracy metrics, although it does not impact ranking metrics.
         | 
| 21 | 
            +
             | 
| 22 | 
            +
            For further information see: <https://github.com/facebookresearch/sscd-copy-detection>
         | 
| 23 | 
            +
             | 
| 24 | 
            +
            ## Model Details
         | 
| 25 | 
            +
             | 
| 26 | 
            +
            - **Model Type:** Image copy detection
         | 
| 27 | 
            +
            - **Model Stats:**
         | 
| 28 | 
            +
                - Params (M): 24.6
         | 
| 29 | 
            +
                - Input image size: 320 x 320
         | 
| 30 | 
            +
            - **Dataset:** DISC21: Dataset for the Image Similarity Challenge 2021
         | 
| 31 | 
            +
             | 
| 32 | 
            +
            - **Papers:**
         | 
| 33 | 
            +
                - Aggregated Residual Transformations for Deep Neural Networks: <https://arxiv.org/abs/1611.05431>
         | 
| 34 | 
            +
                - A Self-Supervised Descriptor for Image Copy Detection: <https://arxiv.org/abs/2202.10261>
         | 
| 35 | 
            +
             | 
| 36 | 
            +
            ## Model Usage
         | 
| 37 | 
            +
             | 
| 38 | 
            +
            ### Image Copy Detection
         | 
| 39 | 
            +
             | 
| 40 | 
            +
            ```python
         | 
| 41 | 
            +
            import torch
         | 
| 42 | 
            +
            import torch.nn.functional as F
         | 
| 43 | 
            +
            from PIL import Image
         | 
| 44 | 
            +
             | 
| 45 | 
            +
            import birder
         | 
| 46 | 
            +
            from birder.inference.classification import infer_image
         | 
| 47 | 
            +
             | 
| 48 | 
            +
            (net, model_info) = birder.load_pretrained_model("sscd_resnext_101_c1", file_format="pts", inference=True)
         | 
| 49 | 
            +
             | 
| 50 | 
            +
            # Get the image size the model was trained on
         | 
| 51 | 
            +
            size = birder.get_size_from_signature(model_info.signature)
         | 
| 52 | 
            +
             | 
| 53 | 
            +
            # Create an inference transform
         | 
| 54 | 
            +
            transform = birder.classification_transform(size, model_info.rgb_stats)
         | 
| 55 | 
            +
             | 
| 56 | 
            +
            image1 = Image.open("path/to/image1.jpeg")
         | 
| 57 | 
            +
            image2 = Image.open("path/to/image2.jpeg")
         | 
| 58 | 
            +
            out1 = net(transform(image1).unsqueeze(dim=0))
         | 
| 59 | 
            +
            out2 = net(transform(image2).unsqueeze(dim=0))
         | 
| 60 | 
            +
            # Both out1 and out2 have torch.Size([1, 512])
         | 
| 61 | 
            +
             | 
| 62 | 
            +
            # Calculate cosine similarity (higher = more similar, range: -1 to 1)
         | 
| 63 | 
            +
            F.cosine_similarity(out1, out2, dim=1)
         | 
| 64 | 
            +
             | 
| 65 | 
            +
            # Calculate Euclidean distance (lower = more similar)
         | 
| 66 | 
            +
            torch.linalg.vector_norm(out1 - out2, dim=1)
         | 
| 67 | 
            +
            ```
         | 
| 68 | 
            +
             | 
| 69 | 
            +
            ### Image Embeddings
         | 
| 70 | 
            +
             | 
| 71 | 
            +
            ```python
         | 
| 72 | 
            +
            import birder
         | 
| 73 | 
            +
            from birder.inference.classification import infer_image
         | 
| 74 | 
            +
             | 
| 75 | 
            +
            (net, model_info) = birder.load_pretrained_model("sscd_resnext_101_c1", inference=True)
         | 
| 76 | 
            +
             | 
| 77 | 
            +
            # Get the image size the model was trained on
         | 
| 78 | 
            +
            size = birder.get_size_from_signature(model_info.signature)
         | 
| 79 | 
            +
             | 
| 80 | 
            +
            # Create an inference transform
         | 
| 81 | 
            +
            transform = birder.classification_transform(size, model_info.rgb_stats)
         | 
| 82 | 
            +
             | 
| 83 | 
            +
            image = "path/to/image.jpeg"  # or a PIL image
         | 
| 84 | 
            +
            (out, embedding) = infer_image(net, image, transform, return_embedding=True)
         | 
| 85 | 
            +
            # embedding is a NumPy array with shape of (1, 2048)
         | 
| 86 | 
            +
            ```
         | 
| 87 | 
            +
             | 
| 88 | 
            +
            ### Detection Feature Map
         | 
| 89 | 
            +
             | 
| 90 | 
            +
            ```python
         | 
| 91 | 
            +
            from PIL import Image
         | 
| 92 | 
            +
            import birder
         | 
| 93 | 
            +
             | 
| 94 | 
            +
            (net, model_info) = birder.load_pretrained_model("sscd_resnext_101_c1", inference=True)
         | 
| 95 | 
            +
             | 
| 96 | 
            +
            # Get the image size the model was trained on
         | 
| 97 | 
            +
            size = birder.get_size_from_signature(model_info.signature)
         | 
| 98 | 
            +
             | 
| 99 | 
            +
            # Create an inference transform
         | 
| 100 | 
            +
            transform = birder.classification_transform(size, model_info.rgb_stats)
         | 
| 101 | 
            +
             | 
| 102 | 
            +
            image = Image.open("path/to/image.jpeg")
         | 
| 103 | 
            +
            features = net.detection_features(transform(image).unsqueeze(0))
         | 
| 104 | 
            +
            # features is a dict (stage name -> torch.Tensor)
         | 
| 105 | 
            +
            print([(k, v.size()) for k, v in features.items()])
         | 
| 106 | 
            +
            # Output example:
         | 
| 107 | 
            +
            # [('stage1', torch.Size([1, 256, 80, 80])),
         | 
| 108 | 
            +
            #  ('stage2', torch.Size([1, 512, 40, 40])),
         | 
| 109 | 
            +
            #  ('stage3', torch.Size([1, 1024, 20, 20])),
         | 
| 110 | 
            +
            #  ('stage4', torch.Size([1, 2048, 10, 10]))]
         | 
| 111 | 
            +
            ```
         | 
| 112 | 
            +
             | 
| 113 | 
            +
            ## Citation
         | 
| 114 | 
            +
             | 
| 115 | 
            +
            ```bibtex
         | 
| 116 | 
            +
            @misc{xie2017aggregatedresidualtransformationsdeep,
         | 
| 117 | 
            +
                  title={Aggregated Residual Transformations for Deep Neural Networks},
         | 
| 118 | 
            +
                  author={Saining Xie and Ross Girshick and Piotr Dollár and Zhuowen Tu and Kaiming He},
         | 
| 119 | 
            +
                  year={2017},
         | 
| 120 | 
            +
                  eprint={1611.05431},
         | 
| 121 | 
            +
                  archivePrefix={arXiv},
         | 
| 122 | 
            +
                  primaryClass={cs.CV},
         | 
| 123 | 
            +
                  url={https://arxiv.org/abs/1611.05431},
         | 
| 124 | 
            +
            }
         | 
| 125 | 
            +
             | 
| 126 | 
            +
            @misc{pizzi2022selfsuperviseddescriptorimagecopy,
         | 
| 127 | 
            +
                  title={A Self-Supervised Descriptor for Image Copy Detection},
         | 
| 128 | 
            +
                  author={Ed Pizzi and Sreya Dutta Roy and Sugosh Nagavara Ravindra and Priya Goyal and Matthijs Douze},
         | 
| 129 | 
            +
                  year={2022},
         | 
| 130 | 
            +
                  eprint={2202.10261},
         | 
| 131 | 
            +
                  archivePrefix={arXiv},
         | 
| 132 | 
            +
                  primaryClass={cs.CV},
         | 
| 133 | 
            +
                  url={https://arxiv.org/abs/2202.10261},
         | 
| 134 | 
            +
            }
         | 
| 135 | 
            +
            ```
         | 
