JH-C-k commited on Aug 4

Commit

6ace3e1

verified ·

1 Parent(s): e7ee7bf

Add files using upload-large-folder tool

Browse files

Files changed (26) hide show

clip/facebook/metaclip-b16-400m/.gitattributes +35 -0
clip/facebook/metaclip-b16-400m/README.md +46 -0
clip/facebook/metaclip-b16-400m/added_tokens.json +4 -0
clip/facebook/metaclip-b16-400m/config.json +20 -0
clip/facebook/metaclip-b16-400m/merges.txt +0 -0
clip/facebook/metaclip-b16-400m/preprocessor_config.json +28 -0
clip/facebook/metaclip-b16-400m/special_tokens_map.json +6 -0
clip/facebook/metaclip-b16-400m/tokenizer_config.json +33 -0
clip/facebook/metaclip-b16-400m/vocab.json +0 -0
clip/facebook/metaclip-b16-fullcc2.5b/.gitattributes +35 -0
clip/facebook/metaclip-b16-fullcc2.5b/README.md +46 -0
clip/facebook/metaclip-b16-fullcc2.5b/added_tokens.json +4 -0
clip/facebook/metaclip-b16-fullcc2.5b/config.json +20 -0
clip/facebook/metaclip-b16-fullcc2.5b/merges.txt +0 -0
clip/facebook/metaclip-b16-fullcc2.5b/preprocessor_config.json +28 -0
clip/facebook/metaclip-b16-fullcc2.5b/special_tokens_map.json +6 -0
clip/facebook/metaclip-b16-fullcc2.5b/tokenizer_config.json +33 -0
clip/facebook/metaclip-b16-fullcc2.5b/vocab.json +0 -0
clip/facebook/metaclip-b32-400m/preprocessor_config.json +28 -0
clip/facebook/metaclip-b32-400m/special_tokens_map.json +6 -0
clip/facebook/metaclip-b32-400m/tokenizer_config.json +33 -0
clip/facebook/metaclip-b32-400m/vocab.json +0 -0
clip/facebook/metaclip-l14-400m/config.json +25 -0
clip/laion/CLIP-ViT-L-14-laion2B-s32B-b82K/open_clip_pytorch_model.bin +3 -0
clip/laion/CLIP-ViT-L-14-laion2B-s32B-b82K/pytorch_model.bin +3 -0
clip/openai/clip-vit-large-patch14/flax_model.msgpack +3 -0

clip/facebook/metaclip-b16-400m/.gitattributes ADDED Viewed

	@@ -0,0 +1,35 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text

clip/facebook/metaclip-b16-400m/README.md ADDED Viewed

	@@ -0,0 +1,46 @@

+---
+license: cc-by-nc-4.0
+tags:
+- vision
+- metaclip
+widget:
+- src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/cat-dog-music.png
+  candidate_labels: playing music, playing sports
+  example_title: Cat & Dog
+---
+# MetaCLIP model, base-sized version, patch resolution 16
+MetaCLIP model applied to 400 million data points of CommonCrawl (CC). It was introduced in the paper [Demystifying CLIP Data](https://arxiv.org/abs/2309.16671) by Xu et al. and first released in [this repository](https://github.com/facebookresearch/MetaCLIP).
+Disclaimer: The team releasing MetaCLIP did not write a model card for this model so this model card has been written by the Hugging Face team.
+## Model description
+The [Demystifying CLIP Data](https://arxiv.org/abs/2309.16671) paper aims to reveal CLIP’s method around training data curation. OpenAI never open-sourced code regarding their data preparation pipeline.
+<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/clip_overview.jpg"
+alt="drawing" width="600"/>
+<small> CLIP high-level overview. Taken from the <a href="https://arxiv.org/abs/2103.00020">CLIP paper</a>. </small>
+## Intended uses & limitations
+You can use the raw model for linking images with text in a shared embedding space. This enables things like zero-shot image classification, text-based image retrieval, image-based text retrieval, etc.
+### How to use
+We refer to the [docs](https://huggingface.co/docs/transformers/main/en/model_doc/clip#usage). Just replace the names of the models on the hub.
+### BibTeX entry and citation info
+```bibtex
+@misc{xu2023demystifying,
+      title={Demystifying CLIP Data},
+      author={Hu Xu and Saining Xie and Xiaoqing Ellen Tan and Po-Yao Huang and Russell Howes and Vasu Sharma and Shang-Wen Li and Gargi Ghosh and Luke Zettlemoyer and Christoph Feichtenhofer},
+      year={2023},
+      eprint={2309.16671},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV}
+}
+```

clip/facebook/metaclip-b16-400m/added_tokens.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "<|endoftext|>": 49407,
+  "<|startoftext|>": 49406
+}

clip/facebook/metaclip-b16-400m/config.json ADDED Viewed

	@@ -0,0 +1,20 @@

+{
+  "architectures": [
+    "CLIPModel"
+  ],
+  "initializer_factor": 1.0,
+  "logit_scale_init_value": 2.6592,
+  "model_type": "clip",
+  "projection_dim": 512,
+  "text_config": {
+    "heads": 8,
+    "layers": 12,
+    "model_type": "clip_text_model"
+  },
+  "torch_dtype": "float32",
+  "transformers_version": "4.34.0",
+  "vision_config": {
+    "model_type": "clip_vision_model",
+    "patch_size": 16
+  }
+}

clip/facebook/metaclip-b16-400m/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

clip/facebook/metaclip-b16-400m/preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,28 @@

+{
+  "crop_size": {
+    "height": 224,
+    "width": 224
+  },
+  "do_center_crop": true,
+  "do_convert_rgb": true,
+  "do_normalize": true,
+  "do_rescale": true,
+  "do_resize": true,
+  "image_mean": [
+    0.48145466,
+    0.4578275,
+    0.40821073
+  ],
+  "image_processor_type": "CLIPImageProcessor",
+  "image_std": [
+    0.26862954,
+    0.26130258,
+    0.27577711
+  ],
+  "processor_class": "CLIPProcessor",
+  "resample": 3,
+  "rescale_factor": 0.00392156862745098,
+  "size": {
+    "shortest_edge": 224
+  }
+}

clip/facebook/metaclip-b16-400m/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "bos_token": "<|startoftext|>",
+  "eos_token": "<|endoftext|>",
+  "pad_token": "<|endoftext|>",
+  "unk_token": "<|endoftext|>"
+}

clip/facebook/metaclip-b16-400m/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,33 @@

+{
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "49406": {
+      "content": "<|startoftext|>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "49407": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [],
+  "bos_token": "<|startoftext|>",
+  "clean_up_tokenization_spaces": true,
+  "do_lower_case": true,
+  "eos_token": "<|endoftext|>",
+  "errors": "replace",
+  "model_max_length": 77,
+  "pad_token": "<|endoftext|>",
+  "processor_class": "CLIPProcessor",
+  "tokenizer_class": "CLIPTokenizer",
+  "tokenizer_file": "/Users/georgebredis/.cache/huggingface/hub/models--openai--clip-vit-base-patch32/snapshots/e6a30b603a447e251fdaca1c3056b2a16cdfebeb/tokenizer.json",
+  "unk_token": "<|endoftext|>"
+}

clip/facebook/metaclip-b16-400m/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff

clip/facebook/metaclip-b16-fullcc2.5b/.gitattributes ADDED Viewed

	@@ -0,0 +1,35 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text

clip/facebook/metaclip-b16-fullcc2.5b/README.md ADDED Viewed

	@@ -0,0 +1,46 @@

+---
+license: cc-by-nc-4.0
+tags:
+- vision
+- metaclip
+widget:
+- src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/cat-dog-music.png
+  candidate_labels: playing music, playing sports
+  example_title: Cat & Dog
+---
+# MetaCLIP model, base-sized version, patch resolution 16
+MetaCLIP model applied to 2.5 billion data points of CommonCrawl (CC). It was introduced in the paper [Demystifying CLIP Data](https://arxiv.org/abs/2309.16671) by Xu et al. and first released in [this repository](https://github.com/facebookresearch/MetaCLIP).
+Disclaimer: The team releasing MetaCLIP did not write a model card for this model so this model card has been written by the Hugging Face team.
+## Model description
+The [Demystifying CLIP Data](https://arxiv.org/abs/2309.16671) paper aims to reveal CLIP’s method around training data curation. OpenAI never open-sourced code regarding their data preparation pipeline.
+<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/clip_overview.jpg"
+alt="drawing" width="600"/>
+<small> CLIP high-level overview. Taken from the <a href="https://arxiv.org/abs/2103.00020">CLIP paper</a>. </small>
+## Intended uses & limitations
+You can use the raw model for linking images with text in a shared embedding space. This enables things like zero-shot image classification, text-based image retrieval, image-based text retrieval, etc.
+### How to use
+We refer to the [docs](https://huggingface.co/docs/transformers/main/en/model_doc/clip#usage). Just replace the names of the models on the hub.
+### BibTeX entry and citation info
+```bibtex
+@misc{xu2023demystifying,
+      title={Demystifying CLIP Data},
+      author={Hu Xu and Saining Xie and Xiaoqing Ellen Tan and Po-Yao Huang and Russell Howes and Vasu Sharma and Shang-Wen Li and Gargi Ghosh and Luke Zettlemoyer and Christoph Feichtenhofer},
+      year={2023},
+      eprint={2309.16671},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV}
+}
+```

clip/facebook/metaclip-b16-fullcc2.5b/added_tokens.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "<|endoftext|>": 49407,
+  "<|startoftext|>": 49406
+}

clip/facebook/metaclip-b16-fullcc2.5b/config.json ADDED Viewed

	@@ -0,0 +1,20 @@

+{
+  "architectures": [
+    "CLIPModel"
+  ],
+  "initializer_factor": 1.0,
+  "logit_scale_init_value": 2.6592,
+  "model_type": "clip",
+  "projection_dim": 512,
+  "text_config": {
+    "heads": 8,
+    "layers": 12,
+    "model_type": "clip_text_model"
+  },
+  "torch_dtype": "float32",
+  "transformers_version": "4.34.0",
+  "vision_config": {
+    "model_type": "clip_vision_model",
+    "patch_size": 16
+  }
+}

clip/facebook/metaclip-b16-fullcc2.5b/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

clip/facebook/metaclip-b16-fullcc2.5b/preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,28 @@

+{
+  "crop_size": {
+    "height": 224,
+    "width": 224
+  },
+  "do_center_crop": true,
+  "do_convert_rgb": true,
+  "do_normalize": true,
+  "do_rescale": true,
+  "do_resize": true,
+  "image_mean": [
+    0.48145466,
+    0.4578275,
+    0.40821073
+  ],
+  "image_processor_type": "CLIPImageProcessor",
+  "image_std": [
+    0.26862954,
+    0.26130258,
+    0.27577711
+  ],
+  "processor_class": "CLIPProcessor",
+  "resample": 3,
+  "rescale_factor": 0.00392156862745098,
+  "size": {
+    "shortest_edge": 224
+  }
+}

clip/facebook/metaclip-b16-fullcc2.5b/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "bos_token": "<|startoftext|>",
+  "eos_token": "<|endoftext|>",
+  "pad_token": "<|endoftext|>",
+  "unk_token": "<|endoftext|>"
+}

clip/facebook/metaclip-b16-fullcc2.5b/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,33 @@

+{
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "49406": {
+      "content": "<|startoftext|>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "49407": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [],
+  "bos_token": "<|startoftext|>",
+  "clean_up_tokenization_spaces": true,
+  "do_lower_case": true,
+  "eos_token": "<|endoftext|>",
+  "errors": "replace",
+  "model_max_length": 77,
+  "pad_token": "<|endoftext|>",
+  "processor_class": "CLIPProcessor",
+  "tokenizer_class": "CLIPTokenizer",
+  "tokenizer_file": "/Users/georgebredis/.cache/huggingface/hub/models--openai--clip-vit-base-patch32/snapshots/e6a30b603a447e251fdaca1c3056b2a16cdfebeb/tokenizer.json",
+  "unk_token": "<|endoftext|>"
+}

clip/facebook/metaclip-b16-fullcc2.5b/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff

clip/facebook/metaclip-b32-400m/preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,28 @@

+{
+  "crop_size": {
+    "height": 224,
+    "width": 224
+  },
+  "do_center_crop": true,
+  "do_convert_rgb": true,
+  "do_normalize": true,
+  "do_rescale": true,
+  "do_resize": true,
+  "image_mean": [
+    0.48145466,
+    0.4578275,
+    0.40821073
+  ],
+  "image_processor_type": "CLIPImageProcessor",
+  "image_std": [
+    0.26862954,
+    0.26130258,
+    0.27577711
+  ],
+  "processor_class": "CLIPProcessor",
+  "resample": 3,
+  "rescale_factor": 0.00392156862745098,
+  "size": {
+    "shortest_edge": 224
+  }
+}

clip/facebook/metaclip-b32-400m/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "bos_token": "<|startoftext|>",
+  "eos_token": "<|endoftext|>",
+  "pad_token": "<|endoftext|>",
+  "unk_token": "<|endoftext|>"
+}

clip/facebook/metaclip-b32-400m/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,33 @@

+{
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "49406": {
+      "content": "<|startoftext|>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "49407": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [],
+  "bos_token": "<|startoftext|>",
+  "clean_up_tokenization_spaces": true,
+  "do_lower_case": true,
+  "eos_token": "<|endoftext|>",
+  "errors": "replace",
+  "model_max_length": 77,
+  "pad_token": "<|endoftext|>",
+  "processor_class": "CLIPProcessor",
+  "tokenizer_class": "CLIPTokenizer",
+  "tokenizer_file": "/Users/georgebredis/.cache/huggingface/hub/models--openai--clip-vit-base-patch32/snapshots/e6a30b603a447e251fdaca1c3056b2a16cdfebeb/tokenizer.json",
+  "unk_token": "<|endoftext|>"
+}

clip/facebook/metaclip-b32-400m/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff

clip/facebook/metaclip-l14-400m/config.json ADDED Viewed

	@@ -0,0 +1,25 @@

+{
+  "architectures": [
+    "CLIPModel"
+  ],
+  "initializer_factor": 1.0,
+  "logit_scale_init_value": 2.6592,
+  "model_type": "clip",
+  "projection_dim": 768,
+  "text_config": {
+    "hidden_size": 768,
+    "intermediate_size": 3072,
+    "model_type": "clip_text_model",
+    "num_attention_heads": 12
+  },
+  "torch_dtype": "float32",
+  "transformers_version": "4.34.0",
+  "vision_config": {
+    "hidden_size": 1024,
+    "intermediate_size": 4096,
+    "model_type": "clip_vision_model",
+    "num_attention_heads": 16,
+    "num_hidden_layers": 24,
+    "patch_size": 14
+  }
+}

clip/laion/CLIP-ViT-L-14-laion2B-s32B-b82K/open_clip_pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5ddb47339f44e4fd9cace3d3960d38af1b51a25857440cfae90afc44706d7e2b
+size 1710631365

clip/laion/CLIP-ViT-L-14-laion2B-s32B-b82K/pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:45a6d8e04e46cfc7f55ee74a62a4ef04c95b9ef005981f9ccee19af8906ab181
+size 1710660257

clip/openai/clip-vit-large-patch14/flax_model.msgpack ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:156f677ed4495acd1ec7197249c091b85c240267c82f2f7f2e4eae4177931fed
+size 1710486359