File size: 6,139 Bytes
a13e7bb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 |
# TEMU-VTOFF: Virtual Try-Off & Fashion Understanding Toolkit
TEMU-VTOFF is a state-of-the-art toolkit for virtual try-off and fashion image understanding. It leverages advanced diffusion models, vision-language models, and semantic segmentation to enable garment transfer, attribute captioning, and mask generation for fashion images.
<img src="./assets/teaser.png" alt="example">
## Table of Contents
- [Features](#features)
- [Installation](#installation)
- [Quick Start](#quick-start)
- [Core Components](#core-components)
- [1. Inference Pipeline (`inference.py`)](#1-inference-pipeline-inferencepy)
- [2. Visual Attribute Captioning (`precompute_utils/captioning_qwen.py`)](#2-visual-attribute-captioning-precompute_utilscaptioning_qwenpy)
- [3. Clothing Segmentation (`SegCloth.py`)](#3-clothing-segmentation-segclothpy)
- [Examples](#examples)
- [Citation](#citation)
- [License](#license)
---
## Features
- **Virtual Try-On**: Generate realistic try-on images using Stable Diffusion 3-based pipelines.
- **Visual Attribute Captioning**: Extract fine-grained garment attributes using Qwen-VL.
- **Clothing Segmentation**: Obtain binary and fine masks for garments using SegFormer.
- **Dataset Support**: Works with DressCode and VITON-HD datasets.
---
## Installation
1. **Clone the repository:**
```bash
git clone https://github.com/yourusername/TEMU-VTOFF.git
cd TEMU-VTOFF
```
2. **Install dependencies:**
```bash
pip install -r requirements.txt
```
3. **(Optional) Setup virtual environment:**
```bash
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
```
---
## Quick Start
### 1. Virtual Try-On Inference
```bash
python inference.py \
--pretrained_model_name_or_path <path/to/model> \
--pretrained_model_name_or_path_sd3_tryoff <path/to/tryoff/model> \
--example_image examples/example1.jpg \
--output_dir outputs \
--width 768 --height 1024 \
--guidance_scale 2.0 \
--num_inference_steps 28 \
--category upper_body
```
### 2. Visual Attribute Captioning
```bash
python precompute_utils/captioning_qwen.py \
--pretrained_model_name_or_path Qwen/Qwen2.5-VL-3B-Instruct \
--image_path examples/example1.jpg \
--output_path outputs/example1_caption.txt \
--image_category upper_body
```
### 3. Clothing Segmentation
```python
from PIL import Image
from SegCloth import segment_clothing
img = Image.open("examples/example1.jpg")
binary_mask, fine_mask = segment_clothing(img, category="upper_body")
binary_mask.save("outputs/example1_binary_mask.jpg")
fine_mask.save("outputs/example1_fine_mask.jpg")
```
---
## Core Components
### 1. Inference Pipeline (`inference.py`)
- **Purpose**: Generates virtual try-on images using a Stable Diffusion 3-based pipeline.
- **How it works**:
- Loads pretrained models (VAE, transformers, schedulers, encoders).
- Segments the clothing region using `SegCloth.py`.
- Generates a descriptive caption for the garment using Qwen-VL (`captioning_qwen.py`).
- Runs the diffusion pipeline to synthesize a new try-on image.
- **Key Arguments**:
- `--pretrained_model_name_or_path`: Path or HuggingFace model ID for the main model.
- `--pretrained_model_name_or_path_sd3_tryoff`: Path or ID for the try-off transformer.
- `--example_image`: Input image path.
- `--output_dir`: Output directory.
- `--category`: Clothing category (`upper_body`, `lower_body`, `dresses`).
- `--width`, `--height`: Output image size.
- `--guidance_scale`, `--num_inference_steps`: Generation parameters.
### 2. Visual Attribute Captioning (`precompute_utils/captioning_qwen.py`)
- **Purpose**: Generates fine-grained, structured captions for fashion images using Qwen2.5-VL.
- **How it works**:
- Loads the Qwen2.5-VL model and processor.
- For a given image, predicts garment attributes (e.g., type, fit, hem, neckline) in a controlled, structured format.
- Can process single images or entire datasets (DressCode, VITON-HD).
- **Key Arguments**:
- `--pretrained_model_name_or_path`: Path or HuggingFace model ID for Qwen2.5-VL.
- `--image_path`: Path to a single image (for single-image captioning).
- `--output_path`: Where to save the generated caption.
- `--image_category`: Garment category (`upper_body`, `lower_body`, `dresses`).
- For batch/dataset mode: `--dataset_name`, `--dataset_root`, `--filename`.
### 3. Clothing Segmentation (`SegCloth.py`)
- **Purpose**: Segments clothing regions in images, producing:
- A binary mask (black & white) of the garment.
- A fine mask image where the garment is grayed out.
- **How it works**:
- Uses a SegFormer model (`mattmdjaga/segformer_b2_clothes`) via HuggingFace `transformers` pipeline.
- Supports categories: `upper_body`, `dresses`, `lower_body`.
- Provides both single-image and batch processing functions.
- **Usage**:
- `segment_clothing(img, category)`: Returns `(binary_mask, fine_mask)` for a PIL image.
- `batch_segment_clothing(img_dir, out_dir)`: Processes all images in a directory.
---
## Examples
See the `examples/` directory for sample images, masks and captions. Example usage scripts are provided for each core component.
Here is the workflow of this model and a comparison of its results with other models.
**Workflow
<img src="./assets/workflow.png" alt="Workflow" />
**Compair
<img src="./assets/compair.png" alt="compair" />
---
## Citation
If you use TEMU-VTOFF in your research or product, please cite this repository and the relevant models (e.g., Stable Diffusion 3, Qwen2.5-VL, SegFormer).
```
@misc{temu-vtoff,
author = {Your Name or Organization},
title = {TEMU-VTOFF: Virtual Try-On & Fashion Understanding Toolkit},
year = {2024},
howpublished = {\url{https://github.com/yourusername/TEMU-VTOFF}}
}
```
---
## License
This project is licensed under the [LICENSE](LICENSE) provided in the repository. Please check individual model and dataset licenses for additional terms.
|