SherryXTChen
/

Instruct-CLIP

@@ -1,26 +1,75 @@
 ---
-tags:
-- model_hub_mixin
-- pytorch_model_hub_mixin
-license: apache-2.0
 datasets:
 - timbrooks/instructpix2pix-clip-filtered
 - SherryXTChen/InstructCLIP-InstructPix2Pix-Data
 language:
 - en
-pipeline_tag: image-to-text
-base_model:
-- SherryXTChen/LatentDiffusionDINOv2
 ---
-This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
 The model is based on the paper [Instruct-CLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning](https://huggingface.co/papers/2503.18406).
-- Library:
-  ```
-  torch==2.4.0
-  torchvision==0.19.0
-  diffusers==0.30.3
-  transformers==4.45.2
-  ```
-- Docs: See our [repo](https://github.com/SherryXTChen/Instruct-CLIP.git) for more information.

 ---
+base_model:
+- SherryXTChen/LatentDiffusionDINOv2
 datasets:
 - timbrooks/instructpix2pix-clip-filtered
 - SherryXTChen/InstructCLIP-InstructPix2Pix-Data
 language:
 - en
+license: apache-2.0
+pipeline_tag: image-to-image
+library_name: diffusers
+tags:
+- model_hub_mixin
+- pytorch_model_hub_mixin
 ---
+# InstructCLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning (CVPR 2025)
+This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration.
 The model is based on the paper [Instruct-CLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning](https://huggingface.co/papers/2503.18406).
+[Arxiv](http://arxiv.org/abs/2503.18406) | [Image Editing Model](https://huggingface.co/SherryXTChen/InstructCLIP-InstructPix2Pix) | [Data Refinement Model](https://huggingface.co/SherryXTChen/Instruct-CLIP) | [Data](https://huggingface.co/datasets/SherryXTChen/InstructCLIP-InstructPix2Pix-Data)
+## Capabilities
+<p align="center">
+  <img src="https://github.com/SherryXTChen/Instruct-CLIP/blob/main/assets/teaser_1.png" alt="Figure 1" width="43%">
+  <img src="https://github.com/SherryXTChen/Instruct-CLIP/blob/main/assets/teaser_2.png" alt="Figure 2" width="50%">
+</p>
+## Installation
+```
+pip install -r requirements.txt
+```
+## Inference
+```python
+import PIL
+import requests
+import torch
+from diffusers import StableDiffusionInstructPix2PixPipeline, EulerAncestralDiscreteScheduler
+model_id = "timbrooks/instruct-pix2pix"
+pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
+pipe.load_lora_weights("SherryXTChen/InstructCLIP-InstructPix2Pix")
+pipe.to("cuda")
+pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
+url = "https://raw.githubusercontent.com/SherryXTChen/Instruct-CLIP/refs/heads/main/assets/1_input.jpg"
+def download_image(url):
+    image = PIL.Image.open(requests.get(url, stream=True).raw)
+    image = PIL.ImageOps.exif_transpose(image)
+    image = image.convert("RGB")
+    return image
+image = download_image(url)
+prompt = "as a 3 d sculpture"
+images = pipe(prompt, image=image, num_inference_steps=20).images
+images[0].save("output.jpg")
+```
+## Citation
+```bibtex
+@misc{chen2025instructclipimprovinginstructionguidedimage,
+      title={Instruct-CLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning},
+      author={Sherry X. Chen and Misha Sra and Pradeep Sen},
+      year={2025},
+      eprint={2503.18406},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV},
+      url={https://arxiv.org/abs/2503.18406},
+}
+```