Camais03 commited on
Commit
a8e34d7
Β·
verified Β·
1 Parent(s): 41eb534

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +65 -16
README.md CHANGED
@@ -48,32 +48,82 @@ An advanced deep learning model for automatically tagging anime/manga illustrati
48
 
49
  ## πŸ“Š Performance Analysis
50
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51
  ### Complete v1 vs v2 Performance Comparison
52
 
53
  | CATEGORY | v1 Micro F1 | v2 Micro F1 | Micro Ξ” | v1 Macro F1 | v2 Macro F1 | Macro Ξ” |
54
  |----------|-------------|-------------|---------|-------------|-------------|---------|
55
- | **Overall** | 58.1% | **67.3%** | **+9.2pp** | 31.5% | **50.6%** | **+19.1pp** |
56
- | **Artist** | 47.4% | **70.0%** | **+22.6pp** | 29.8% | **64.4%** | **+34.6pp** |
57
- | **Character** | 74.6% | **83.4%** | **+8.8pp** | 47.8% | **64.5%** | **+16.7pp** |
58
- | **Copyright** | 76.3% | **86.6%** | **+10.3pp** | 37.7% | **53.1%** | **+15.4pp** |
59
- | **General** | 57.6% | **66.4%** | **+8.8pp** | 20.4% | **27.4%** | **+7.0pp** |
60
- | **Meta** | 55.7% | **61.2%** | **+5.5pp** | 14.4% | **19.2%** | **+4.8pp** |
61
- | **Rating** | 77.9% | **83.1%** | **+5.2pp** | 76.8% | **81.8%** | **+5.0pp** |
62
- | **Year** | 33.1% | **30.8%** | **-2.3pp** | 28.6% | **21.3%** | **-7.3pp** |
63
 
64
- *Both using the balanced preset.
65
 
66
  ### Key Performance Insights
67
 
68
  The v2 model shows remarkable improvements across nearly all categories:
69
 
70
- - **Artist Recognition**: Massive +22.6pp micro F1 improvement, indicating much better artist identification
71
- - **Character Detection**: Strong +8.8pp micro F1 and +16.7pp macro F1 gains
72
- - **Copyright Recognition**: Excellent +10.3pp micro F1 improvement for series identification
73
- - **General Tags**: Consistent +8.8pp micro F1 improvement for visual attributes
74
- - **Overall Macro F1**: Exceptional +19.1pp improvement shows much better rare tag recognition
75
 
76
- Only the year category shows slight regression, likely due to the reduced model complexity making temporal classification more challenging.
77
 
78
  ### Detailed v2 Performance
79
 
@@ -161,7 +211,6 @@ The model was trained using an innovative multi-resolution approach:
161
  - Streamlit
162
  - PIL/Pillow
163
  - NumPy
164
- - Flash Attention (note: doesn't work properly on Windows only needed for refined model which I'm not supporting that much anyway)
165
 
166
  ## πŸ”§ Usage
167
 
 
48
 
49
  ## πŸ“Š Performance Analysis
50
 
51
+ ---
52
+ license: gpl-3.0
53
+ datasets:
54
+ - p1atdev/danbooru-2024
55
+ language:
56
+ - en
57
+ pipeline_tag: image-classification
58
+ ---
59
+
60
+ # Camie Tagger v2
61
+
62
+ An advanced deep learning model for automatically tagging anime/manga illustrations with relevant tags across multiple categories, achieving **67.3% micro F1 score** (50.6% macro F1 score using the macro optimized threshold preset) across 70,527 possible tags on a test set of 20,116 samples. Now with Vision Transformer backbone and significantly improved performance.
63
+
64
+ ## πŸš€ What's New in v2
65
+
66
+ ### Major Performance Improvements
67
+ - **Micro F1**: 58.1% β†’ **67.3%** (+9.2 percentage points)
68
+ - **Macro F1**: 31.5% β†’ **50.6%** (+19.1 percentage points)
69
+ - **Model Size**: 424M β†’ **143M parameters** (-66% reduction)
70
+ - **Architecture**: Switched from EfficientNetV2-L to Vision Transformer (ViT) backbone
71
+ - **Simplified Design**: Streamlined from dual-stage to single refined prediction model
72
+
73
+ ### Training Innovations
74
+ - **Multi-Resolution Training**: Progressive scaling from 384px β†’ 512px resolution
75
+ - **IRFS (Instance-Aware Repeat Factor Sampling)**: Significant macro F1 improvements for rare tags
76
+ - **Adaptive Training**: Models quickly adapt to resolution/distribution changes after initial pretraining
77
+
78
+ *v2 demonstrates that Vision Transformers can achieve superior anime image tagging performance with fewer parameters and cleaner architecture.*
79
+
80
+ ## πŸ”‘ Key Highlights
81
+
82
+ - **Efficient Training**: Completed on just a single RTX 3060 GPU (12GB VRAM)
83
+ - **Fast Adaptation**: Models adapt to new resolutions/distributions within partial epochs after pretraining
84
+ - **Comprehensive Coverage**: 70,527 tags across 7 categories (general, character, copyright, artist, meta, rating, year)
85
+ - **Modern Architecture**: Vision Transformer backbone with cross-attention refinement
86
+ - **User-Friendly Interface**: Easy-to-use application with customizable thresholds and tag collection game
87
+
88
+ ## ✨ Features
89
+
90
+ - **Multi-category tagging system**: Handles general tags, characters, copyright (series), artists, meta information, and content ratings
91
+ - **High performance**: 67.3% micro F1 score (50.6% macro F1) across 70,527 possible tags
92
+ - **Windows compatibility**: Works on Windows without Flash Attention requirements
93
+ - **Streamlit web interface**: User-friendly UI for uploading and analyzing images and a tag collection game
94
+ - **Adjustable threshold profiles**: Micro, Macro, Balanced, Category-specific, High Precision, and High Recall profiles
95
+ - **Fine-grained control**: Per-category threshold adjustments for precision-recall tradeoffs
96
+ - **Safetensors and ONNX**: Original pickle files available in /models
97
+ - **Vision Transformer Backbone**: Modern architecture with superior performance-to-parameter ratio
98
+
99
+ ## πŸ“Š Performance Analysis
100
+
101
  ### Complete v1 vs v2 Performance Comparison
102
 
103
  | CATEGORY | v1 Micro F1 | v2 Micro F1 | Micro Ξ” | v1 Macro F1 | v2 Macro F1 | Macro Ξ” |
104
  |----------|-------------|-------------|---------|-------------|-------------|---------|
105
+ | **Overall** | 61.3% | **67.3%** | **+6.0pp** | 33.8% | **50.6%** | **+16.8pp** |
106
+ | **Artist** | 48.0% | **70.0%** | **+22.0pp** | 29.9% | **66.1%** | **+36.2pp** |
107
+ | **Character** | 75.7% | **83.4%** | **+7.7pp** | 52.4% | **66.2%** | **+13.8pp** |
108
+ | **Copyright** | 79.2% | **86.6%** | **+7.4pp** | 41.9% | **56.2%** | **+14.3pp** |
109
+ | **General** | 60.8% | **66.4%** | **+5.6pp** | 21.5% | **34.6%** | **+13.1pp** |
110
+ | **Meta** | 60.2% | **61.2%** | **+1.0pp** | 14.5% | **23.7%** | **+9.2pp** |
111
+ | **Rating** | 80.8% | **83.1%** | **+2.3pp** | 79.5% | **77.5%** | **-2.0pp** |
112
+ | **Year** | 33.2% | **30.8%** | **-2.4pp** | 29.3% | **32.6%** | **+3.3pp** |
113
 
114
+ *Micro F1 comparison using micro-optimized thresholds, Macro F1 comparison using macro-optimized thresholds for fair evaluation.*
115
 
116
  ### Key Performance Insights
117
 
118
  The v2 model shows remarkable improvements across nearly all categories:
119
 
120
+ - **Artist Recognition**: Massive +22.0pp micro F1 improvement and +36.2pp macro improvement, indicating much better artist identification
121
+ - **Character Detection**: Large +7.7pp micro F1 and +13.8pp macro F1 gains
122
+ - **Copyright Recognition**: Excellent +7.4pp micro F1 improvement and +14.3pp macro improvement for series identification
123
+ - **General Tags**: Improved +5.6pp micro F1 and +13.1pp macro F1 for visual attributes
124
+ - **Overall Macro F1**: Exceptional +16.8pp improvement shows much better rare tag recognition
125
 
126
+ Only the year category shows slight regression.
127
 
128
  ### Detailed v2 Performance
129
 
 
211
  - Streamlit
212
  - PIL/Pillow
213
  - NumPy
 
214
 
215
  ## πŸ”§ Usage
216