Image-to-Image
File size: 6,273 Bytes
8eaf706
 
 
 
 
 
 
 
 
 
a5df0b4
994ae23
 
 
 
 
a5df0b4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
819e5e1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8eaf706
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
---
license: apache-2.0
datasets:
- XiuAiMoon/VITON-HD
- JianhaoZeng/Dresscode
base_model:
- davidelobba/TEMU-VTOFF
- stabilityai/stable-diffusion-3.5-large
- Qwen/Qwen2.5-VL-7B-Instruct
pipeline_tag: image-to-image
---
Please give me a star(🌟) 

https://github.com/Phoenix-95107/Virtual_Try_Off


# TEMU-VTOFF: Virtual Try-Off & Fashion Understanding Toolkit
TEMU-VTOFF is a state-of-the-art toolkit for virtual try-off and fashion image understanding. It leverages advanced diffusion models, vision-language models, and semantic segmentation to enable garment transfer, attribute captioning, and mask generation for fashion images.
<img src="./assets/teaser.png" alt="example">
## Table of Contents

- [Features](#features)
- [Installation](#installation)
- [Quick Start](#quick-start)
- [Core Components](#core-components)
  - [1. Inference Pipeline (`inference.py`)](#1-inference-pipeline-inferencepy)
  - [2. Visual Attribute Captioning (`precompute_utils/captioning_qwen.py`)](#2-visual-attribute-captioning-precompute_utilscaptioning_qwenpy)
  - [3. Clothing Segmentation (`SegCloth.py`)](#3-clothing-segmentation-segclothpy)
- [Examples](#examples)
- [Citation](#citation)
- [License](#license)

---

## Features

- **Virtual Try-On**: Generate realistic try-on images using Stable Diffusion 3-based pipelines.
- **Visual Attribute Captioning**: Extract fine-grained garment attributes using Qwen-VL.
- **Clothing Segmentation**: Obtain binary and fine masks for garments using SegFormer.
- **Dataset Support**: Works with DressCode and VITON-HD datasets.

---

## Installation

1. **Clone the repository:**

   ```bash
   git clone https://github.com/yourusername/TEMU-VTOFF.git
   cd TEMU-VTOFF
   ```

2. **Install dependencies:**

   ```bash
   pip install -r requirements.txt
   ```

3. **(Optional) Setup virtual environment:**
   ```bash
   python -m venv venv
   source venv/bin/activate  # On Windows: venv\Scripts\activate
   ```

---

## Quick Start

### 1. Virtual Try-On Inference

```bash
python inference.py \
  --pretrained_model_name_or_path <path/to/model> \
  --pretrained_model_name_or_path_sd3_tryoff <path/to/tryoff/model> \
  --example_image examples/example1.jpg \
  --output_dir outputs \
  --width 768 --height 1024 \
  --guidance_scale 2.0 \
  --num_inference_steps 28 \
  --category upper_body
```

### 2. Visual Attribute Captioning

```bash
python precompute_utils/captioning_qwen.py \
  --pretrained_model_name_or_path Qwen/Qwen2.5-VL-3B-Instruct \
  --image_path examples/example1.jpg \
  --output_path outputs/example1_caption.txt \
  --image_category upper_body
```

### 3. Clothing Segmentation

```python
from PIL import Image
from SegCloth import segment_clothing

img = Image.open("examples/example1.jpg")
binary_mask, fine_mask = segment_clothing(img, category="upper_body")
binary_mask.save("outputs/example1_binary_mask.jpg")
fine_mask.save("outputs/example1_fine_mask.jpg")
```

---

## Core Components

### 1. Inference Pipeline (`inference.py`)

- **Purpose**: Generates virtual try-on images using a Stable Diffusion 3-based pipeline.
- **How it works**:
  - Loads pretrained models (VAE, transformers, schedulers, encoders).
  - Segments the clothing region using `SegCloth.py`.
  - Generates a descriptive caption for the garment using Qwen-VL (`captioning_qwen.py`).
  - Runs the diffusion pipeline to synthesize a new try-on image.
- **Key Arguments**:
  - `--pretrained_model_name_or_path`: Path or HuggingFace model ID for the main model.
  - `--pretrained_model_name_or_path_sd3_tryoff`: Path or ID for the try-off transformer.
  - `--example_image`: Input image path.
  - `--output_dir`: Output directory.
  - `--category`: Clothing category (`upper_body`, `lower_body`, `dresses`).
  - `--width`, `--height`: Output image size.
  - `--guidance_scale`, `--num_inference_steps`: Generation parameters.

### 2. Visual Attribute Captioning (`precompute_utils/captioning_qwen.py`)

- **Purpose**: Generates fine-grained, structured captions for fashion images using Qwen2.5-VL.
- **How it works**:
  - Loads the Qwen2.5-VL model and processor.
  - For a given image, predicts garment attributes (e.g., type, fit, hem, neckline) in a controlled, structured format.
  - Can process single images or entire datasets (DressCode, VITON-HD).
- **Key Arguments**:
  - `--pretrained_model_name_or_path`: Path or HuggingFace model ID for Qwen2.5-VL.
  - `--image_path`: Path to a single image (for single-image captioning).
  - `--output_path`: Where to save the generated caption.
  - `--image_category`: Garment category (`upper_body`, `lower_body`, `dresses`).
  - For batch/dataset mode: `--dataset_name`, `--dataset_root`, `--filename`.

### 3. Clothing Segmentation (`SegCloth.py`)

- **Purpose**: Segments clothing regions in images, producing:
  - A binary mask (black & white) of the garment.
  - A fine mask image where the garment is grayed out.
- **How it works**:
  - Uses a SegFormer model (`mattmdjaga/segformer_b2_clothes`) via HuggingFace `transformers` pipeline.
  - Supports categories: `upper_body`, `dresses`, `lower_body`.
  - Provides both single-image and batch processing functions.
- **Usage**:
  - `segment_clothing(img, category)`: Returns `(binary_mask, fine_mask)` for a PIL image.
  - `batch_segment_clothing(img_dir, out_dir)`: Processes all images in a directory.

---

## Examples

See the `examples/` directory for sample images, masks and captions. Example usage scripts are provided for each core component.
Here is the workflow of this model and a comparison of its results with other models.
**Workflow
<img src="./assets/workflow.png" alt="Workflow" />
**Compair
<img src="./assets/compair.png" alt="compair" />
---

## Citation

If you use TEMU-VTOFF in your research or product, please cite this repository and the relevant models (e.g., Stable Diffusion 3, Qwen2.5-VL, SegFormer).

```
@misc{temu-vtoff,
  author = {Your Name or Organization},
  title = {TEMU-VTOFF: Virtual Try-On & Fashion Understanding Toolkit},
  year = {2024},
  howpublished = {\url{https://github.com/yourusername/TEMU-VTOFF}}
}
```

---

## License

This project is licensed under the [LICENSE](LICENSE) provided in the repository. Please check individual model and dataset licenses for additional terms.