tugcantopaloglu
/

visionfashion

Model card Files Files and versions Community

tugcantopaloglu commited on 19 days ago

Commit

b6adc51

·

verified ·

1 Parent(s): 06a06c3

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -17,6 +17,8 @@ pipeline_tag: image-to-text
 **VisionFashion** is a dual-encoder model that learns a joint embedding space for fashion **images** and **text descriptions**.
 It combines a **Vision Transformer (ViT-B/32)** encoder with a **BERT-base** text encoder and is trained in two stages:
 1. **Contrastive pre-training** à la CLIP on the *DeepFashion-MultiModal* dataset
 2. **Task-specific fine-tuning** for (i) *Category* classification and (ii) *Attribute* prediction.

 **VisionFashion** is a dual-encoder model that learns a joint embedding space for fashion **images** and **text descriptions**.
 It combines a **Vision Transformer (ViT-B/32)** encoder with a **BERT-base** text encoder and is trained in two stages:
+**Source can be found in https://github.com/tugcantopaloglu/vision-fashion-paper-deeplearning/**
 1. **Contrastive pre-training** à la CLIP on the *DeepFashion-MultiModal* dataset
 2. **Task-specific fine-tuning** for (i) *Category* classification and (ii) *Attribute* prediction.