Update readme

Browse files

Files changed (3) hide show

.gitattributes +1 -0
README.md +92 -0
teaser.png +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+*.png filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,3 +1,95 @@
 ---
 license: mit
 ---

 ---
 license: mit
+language:
+- en
+library_name: diffusers
+tags:
+- text-to-image
+- personalization
+- adapter
+- stable-diffusion
+- flux
+- diffusers
+base_model:
+- runwayml/stable-diffusion-v1-5
+- stabilityai/stable-diffusion-2-1
+- stabilityai/stable-diffusion-xl-base-1.0
+- stabilityai/stable-diffusion-3.5-large
+- black-forest-labs/FLUX.1-dev
+pipeline_tag: text-to-image
 ---
+# DrUM (**D**raw **You**r **M**ind)
+**DrUM** enables **personalized text-to-image (T2I) generation by integrating reference prompts** into T2I diffusion models. It works with **foundation T2I models such as Stable Diffusion v1/v2/XL/v3 and FLUX**, without requiring additional fine-tuning. DrUM leverages **condition-level modeling in the latent space using a transformer-based adapter**, and integrates seamlessly with **open-source text encoders such as OpenCLIP and Google T5**.
+This repository provides the necessary components to run DrUM for **inference**. For the full source code, training scripts, and detailed documentation, please visit our official **[GitHub repository](https://github.com/Burf/DrUM)** and read the **[research paper](https://arxiv.org/abs/2508.03481)**.
+<p align="center">
+    <img src="teaser.png" width="95%">
+</p>
+## Quickstart
+This model is designed for easy use with the `diffusers` library as a custom pipeline.
+### Installation
+```bash
+pip install torch torchvision diffusers transformers accelerate safetensors huggingface-hub
+```
+### Usage
+```python
+import torch
+from diffusers import DiffusionPipeline
+from pipeline import DrUM
+# Load pipeline and attach DrUM
+#drum = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", custom_pipeline = "Burf/DrUM", pipeline = "runwayml/stable-diffusion-v1-5", torch_dtype = torch.bfloat16, device = "cuda")
+pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype = torch.bfloat16).to("cuda")
+drum = DrUM(pipeline)
+# Generate personalized images
+images = drum(
+    prompt = "a photograph of an astronaut riding a horse",
+    ref = ["A retro-futuristic space exploration movie poster with bold, vibrant colors"],
+    weight = [1.0],
+    alpha = 0.3
+)
+images[0].save("personalized_image.png")
+```
+## Supported foundation T2I models
+DrUM works with a wide variety of foundation T2I models that uses text encoders with same weights:
+| Architecture | Pipeline | Text encoder | DrUM weight |
+|--------------|----------------|-|-------------|
+| Stable Diffusion v1 | `runwayml/stable-diffusion-v1-5`, `prompthero/openjourney-v4`,<br>`stablediffusionapi/realistic-vision-v51`,`stablediffusionapi/deliberate-v2`,<br>`stablediffusionapi/anything-v5`, `WarriorMama777/AbyssOrangeMix2`, ... | `openai/clip-vit-large-patch14` | `L.safetensors` |
+| Stable Diffusion v2 | `stabilityai/stable-diffusion-2-1`, ... | `openai/clip-vit-huge-patch14` | `H.safetensors` |
+| Stable Diffusion XL | `stabilityai/stable-diffusion-xl-base-1.0`, ... | `openai/clip-vit-large-patch14`,<br>`laion/CLIP-ViT-bigG-14-laion2B-39B-b160k` | `L.safetensors`,<br>`bigG.safetensors` |
+| Stable Diffusion v3 | `stabilityai/stable-diffusion-3.5-large`<br>`stabilityai/stable-diffusion-3.5-medium`, ... | `openai/clip-vit-large-patch14`,<br>`laion/CLIP-ViT-bigG-14-laion2B-39B-b160k`,<br>`google/t5-v1_1-xxl` | `L.safetensors`,<br>`bigG.safetensors`,<br>`T5.safetensors` |
+| FLUX | `black-forest-labs/FLUX.1-dev`, ... | `openai/clip-vit-large-patch14`,<br>`google/t5-v1_1-xxl` | `L.safetensors`<br>`T5.safetensors` |
+## Citation
+```
+@inproceedings{kim2025drum,
+	title={Draw Your Mind: Personalized Generation via Condition-Level Modeling in Text-to-Image Diffusion Models},
+	author={Hyungjin Kim, Seokho Ahn, and Young-Duk Seo},
+	booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
+	year={2025}
+}
+```
+## License
+This project is licensed under the MIT License.

teaser.png ADDED Viewed

Git LFS Details

SHA256: 3c8b8e866451821cef200a87ea37b7700218ffb8a803055e234fbe794e5fe6d5
Pointer size: 132 Bytes
Size of remote file: 1.31 MB