initial commit

Browse files

Files changed (3) hide show

.gitattributes +55 -0
LICENSE +0 -0
README.md +158 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,55 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.lz4 filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+# Audio files - uncompressed
+*.pcm filter=lfs diff=lfs merge=lfs -text
+*.sam filter=lfs diff=lfs merge=lfs -text
+*.raw filter=lfs diff=lfs merge=lfs -text
+# Audio files - compressed
+*.aac filter=lfs diff=lfs merge=lfs -text
+*.flac filter=lfs diff=lfs merge=lfs -text
+*.mp3 filter=lfs diff=lfs merge=lfs -text
+*.ogg filter=lfs diff=lfs merge=lfs -text
+*.wav filter=lfs diff=lfs merge=lfs -text
+# Image files - uncompressed
+*.bmp filter=lfs diff=lfs merge=lfs -text
+*.gif filter=lfs diff=lfs merge=lfs -text
+*.png filter=lfs diff=lfs merge=lfs -text
+*.tiff filter=lfs diff=lfs merge=lfs -text
+# Image files - compressed
+*.jpg filter=lfs diff=lfs merge=lfs -text
+*.jpeg filter=lfs diff=lfs merge=lfs -text
+*.webp filter=lfs diff=lfs merge=lfs -text

LICENSE ADDED Viewed

File without changes

README.md ADDED Viewed

	@@ -0,0 +1,158 @@

+---
+tags:
+- text-to-image
+- lora
+- diffusers
+- template:diffusion-lora
+widget:
+- text: A laptop above a dog
+  output:
+    url: images/laptop-above-dog_flux1_compass_004.jpg
+- text: A bird below a skateboard
+  output:
+    url: images/flux_compass_bird1.jpg
+- text: A horse to the left of a bottle
+  output:
+    url: images/horse-left-bottle_flux1_compass_003.jpg
+base_model: black-forest-labs/FLUX.1-dev
+instance_prompt: null
+license: other
+license_name: compass-lora-weights-nc-license
+license_link: LICENSE
+---
+# CoMPaSS-FLUX.1
+<Gallery />
+## Model description
+# CoMPaSS-FLUX.1
+A LoRA adapter that enhances spatial understanding capabilities of the FLUX.1 text-to-image diffusion model. This model demonstrates significant improvements in generating images with specific spatial relationships between objects.
+## Model Details
+- **Base Model**: FLUX.1-dev
+- **LoRA Rank**: 16
+- **Training Data**: SCOP dataset (curated from COCO)
+- **File Size**: ~50MiB
+- **Framework**: Diffusers
+- **License**: Non-Commercial (see LICENSE.md)
+## Intended Use
+- Generating images with accurate spatial relationships between objects
+- Creating compositions that require specific spatial arrangements
+- Enhancing the base model&#39;s spatial understanding while maintaining its other capabilities
+## Performance
+### Key Improvements
+- VISOR benchmark: +98% relative improvement
+- T2I-CompBench Spatial: +67% relative improvement
+- GenEval Position: +131% relative improvement
+- Maintains or improves base model&#39;s image fidelity (FID and CMMD scores)
+## Using the Model
+### Installation
+&#x60;&#x60;&#x60;python
+from diffusers import DiffusionPipeline
+import torch
+from safetensors.torch import load_file
+# Load base model
+pipe &#x3D; DiffusionPipeline.from_pretrained(
+    &quot;black-forest-labs&#x2F;FLUX.1-dev&quot;,
+    torch_dtype&#x3D;torch.float16,
+    variant&#x3D;&quot;fp16&quot;
+).to(&quot;cuda&quot;)
+# Load and apply LoRA weights
+lora_path &#x3D; &quot;path_to_compass_lora.safetensors&quot;
+state_dict &#x3D; load_file(lora_path)
+pipe.load_lora_weights(state_dict)
+&#x60;&#x60;&#x60;
+### Example Usage
+&#x60;&#x60;&#x60;python
+prompt &#x3D; &quot;A motorcycle to the right of a bear&quot;
+image &#x3D; pipe(prompt).images[0]
+&#x60;&#x60;&#x60;
+### Effective Prompting
+The model works well with:
+- Clear spatial relationship descriptors (left, right, above, below)
+- Pairs of distinct objects
+- Explicit spatial relationships (e.g., &quot;A to the right of B&quot;)
+## Training Details
+### Training Data
+- Built using the SCOP (Spatial Constraints-Oriented Pairing) data engine
+- ~28,000 curated object pairs from COCO
+- Enforces criteria for:
+  - Visual significance
+  - Semantic distinction
+  - Spatial clarity
+  - Object relationships
+  - Visual balance
+### Training Process
+- Trained for 24,000 steps
+- Batch size of 4
+- Learning rate: 1e-4
+- Optimizer: AdamW with β₁&#x3D;0.9, β₂&#x3D;0.999
+- Weight decay: 1e-2
+## Evaluation Results
+| Metric | Base FLUX.1 | +CoMPaSS | Relative Improvement |
+|--------|-------------|-----------|-------------------|
+| VISOR uncond | 37.96% | 75.17% | +98% |
+| T2I-CompBench Spatial | 0.18 | 0.30 | +67% |
+| GenEval Position | 0.26 | 0.60 | +131% |
+| FID | 27.96 | 26.40 | +5.6% |
+| CMMD | 0.8737 | 0.6859 | +21.5% |
+## Technical Specifications
+- **Architecture**: MMDiT-based FLUX.1 with LoRA adaptation
+- **LoRA Target**: DoubleStreamBlocks
+- **Parameter Count**: Base model parameters + ~50MiB LoRA weights
+- **Input**: Text prompts (like base FLUX.1)
+- **Output**: 1024×1024 images
+- **Compute Requirements**: Similar to base FLUX.1
+## Citation
+If you use this model in your research, please cite:
+&#x60;&#x60;&#x60;bibtex
+@article{zhang2024compass,
+  title&#x3D;{CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models},
+  author&#x3D;{Zhang, Gaoyang and Fu, Bingtao and Fan, Qingnan and Zhang, Qi and Liu, Runxing and Gu, Hong and Zhang, Huaqi and Liu, Xinguo},
+  journal&#x3D;{arXiv preprint arXiv:2412.13195},
+  year&#x3D;{2024}
+}
+&#x60;&#x60;&#x60;
+## Acknowledgments
+This work builds upon the FLUX.1 model by Black Forest Labs and utilizes the COCO dataset for training data curation.
+## Contact
+For questions about the model, please contact [email protected]
+## Download model
+Weights for this model are available in Safetensors format.
+[Download](/blurgy/CoMPaSS-FLUX.1/tree/main) them in the Files & versions tab.