Update README.md
Browse files
README.md
CHANGED
@@ -48,32 +48,82 @@ An advanced deep learning model for automatically tagging anime/manga illustrati
|
|
48 |
|
49 |
## π Performance Analysis
|
50 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
51 |
### Complete v1 vs v2 Performance Comparison
|
52 |
|
53 |
| CATEGORY | v1 Micro F1 | v2 Micro F1 | Micro Ξ | v1 Macro F1 | v2 Macro F1 | Macro Ξ |
|
54 |
|----------|-------------|-------------|---------|-------------|-------------|---------|
|
55 |
-
| **Overall** |
|
56 |
-
| **Artist** |
|
57 |
-
| **Character** |
|
58 |
-
| **Copyright** |
|
59 |
-
| **General** |
|
60 |
-
| **Meta** |
|
61 |
-
| **Rating** |
|
62 |
-
| **Year** | 33.
|
63 |
|
64 |
-
*
|
65 |
|
66 |
### Key Performance Insights
|
67 |
|
68 |
The v2 model shows remarkable improvements across nearly all categories:
|
69 |
|
70 |
-
- **Artist Recognition**: Massive +22.
|
71 |
-
- **Character Detection**:
|
72 |
-
- **Copyright Recognition**: Excellent +
|
73 |
-
- **General Tags**:
|
74 |
-
- **Overall Macro F1**: Exceptional +
|
75 |
|
76 |
-
Only the year category shows slight regression
|
77 |
|
78 |
### Detailed v2 Performance
|
79 |
|
@@ -161,7 +211,6 @@ The model was trained using an innovative multi-resolution approach:
|
|
161 |
- Streamlit
|
162 |
- PIL/Pillow
|
163 |
- NumPy
|
164 |
-
- Flash Attention (note: doesn't work properly on Windows only needed for refined model which I'm not supporting that much anyway)
|
165 |
|
166 |
## π§ Usage
|
167 |
|
|
|
48 |
|
49 |
## π Performance Analysis
|
50 |
|
51 |
+
---
|
52 |
+
license: gpl-3.0
|
53 |
+
datasets:
|
54 |
+
- p1atdev/danbooru-2024
|
55 |
+
language:
|
56 |
+
- en
|
57 |
+
pipeline_tag: image-classification
|
58 |
+
---
|
59 |
+
|
60 |
+
# Camie Tagger v2
|
61 |
+
|
62 |
+
An advanced deep learning model for automatically tagging anime/manga illustrations with relevant tags across multiple categories, achieving **67.3% micro F1 score** (50.6% macro F1 score using the macro optimized threshold preset) across 70,527 possible tags on a test set of 20,116 samples. Now with Vision Transformer backbone and significantly improved performance.
|
63 |
+
|
64 |
+
## π What's New in v2
|
65 |
+
|
66 |
+
### Major Performance Improvements
|
67 |
+
- **Micro F1**: 58.1% β **67.3%** (+9.2 percentage points)
|
68 |
+
- **Macro F1**: 31.5% β **50.6%** (+19.1 percentage points)
|
69 |
+
- **Model Size**: 424M β **143M parameters** (-66% reduction)
|
70 |
+
- **Architecture**: Switched from EfficientNetV2-L to Vision Transformer (ViT) backbone
|
71 |
+
- **Simplified Design**: Streamlined from dual-stage to single refined prediction model
|
72 |
+
|
73 |
+
### Training Innovations
|
74 |
+
- **Multi-Resolution Training**: Progressive scaling from 384px β 512px resolution
|
75 |
+
- **IRFS (Instance-Aware Repeat Factor Sampling)**: Significant macro F1 improvements for rare tags
|
76 |
+
- **Adaptive Training**: Models quickly adapt to resolution/distribution changes after initial pretraining
|
77 |
+
|
78 |
+
*v2 demonstrates that Vision Transformers can achieve superior anime image tagging performance with fewer parameters and cleaner architecture.*
|
79 |
+
|
80 |
+
## π Key Highlights
|
81 |
+
|
82 |
+
- **Efficient Training**: Completed on just a single RTX 3060 GPU (12GB VRAM)
|
83 |
+
- **Fast Adaptation**: Models adapt to new resolutions/distributions within partial epochs after pretraining
|
84 |
+
- **Comprehensive Coverage**: 70,527 tags across 7 categories (general, character, copyright, artist, meta, rating, year)
|
85 |
+
- **Modern Architecture**: Vision Transformer backbone with cross-attention refinement
|
86 |
+
- **User-Friendly Interface**: Easy-to-use application with customizable thresholds and tag collection game
|
87 |
+
|
88 |
+
## β¨ Features
|
89 |
+
|
90 |
+
- **Multi-category tagging system**: Handles general tags, characters, copyright (series), artists, meta information, and content ratings
|
91 |
+
- **High performance**: 67.3% micro F1 score (50.6% macro F1) across 70,527 possible tags
|
92 |
+
- **Windows compatibility**: Works on Windows without Flash Attention requirements
|
93 |
+
- **Streamlit web interface**: User-friendly UI for uploading and analyzing images and a tag collection game
|
94 |
+
- **Adjustable threshold profiles**: Micro, Macro, Balanced, Category-specific, High Precision, and High Recall profiles
|
95 |
+
- **Fine-grained control**: Per-category threshold adjustments for precision-recall tradeoffs
|
96 |
+
- **Safetensors and ONNX**: Original pickle files available in /models
|
97 |
+
- **Vision Transformer Backbone**: Modern architecture with superior performance-to-parameter ratio
|
98 |
+
|
99 |
+
## π Performance Analysis
|
100 |
+
|
101 |
### Complete v1 vs v2 Performance Comparison
|
102 |
|
103 |
| CATEGORY | v1 Micro F1 | v2 Micro F1 | Micro Ξ | v1 Macro F1 | v2 Macro F1 | Macro Ξ |
|
104 |
|----------|-------------|-------------|---------|-------------|-------------|---------|
|
105 |
+
| **Overall** | 61.3% | **67.3%** | **+6.0pp** | 33.8% | **50.6%** | **+16.8pp** |
|
106 |
+
| **Artist** | 48.0% | **70.0%** | **+22.0pp** | 29.9% | **66.1%** | **+36.2pp** |
|
107 |
+
| **Character** | 75.7% | **83.4%** | **+7.7pp** | 52.4% | **66.2%** | **+13.8pp** |
|
108 |
+
| **Copyright** | 79.2% | **86.6%** | **+7.4pp** | 41.9% | **56.2%** | **+14.3pp** |
|
109 |
+
| **General** | 60.8% | **66.4%** | **+5.6pp** | 21.5% | **34.6%** | **+13.1pp** |
|
110 |
+
| **Meta** | 60.2% | **61.2%** | **+1.0pp** | 14.5% | **23.7%** | **+9.2pp** |
|
111 |
+
| **Rating** | 80.8% | **83.1%** | **+2.3pp** | 79.5% | **77.5%** | **-2.0pp** |
|
112 |
+
| **Year** | 33.2% | **30.8%** | **-2.4pp** | 29.3% | **32.6%** | **+3.3pp** |
|
113 |
|
114 |
+
*Micro F1 comparison using micro-optimized thresholds, Macro F1 comparison using macro-optimized thresholds for fair evaluation.*
|
115 |
|
116 |
### Key Performance Insights
|
117 |
|
118 |
The v2 model shows remarkable improvements across nearly all categories:
|
119 |
|
120 |
+
- **Artist Recognition**: Massive +22.0pp micro F1 improvement and +36.2pp macro improvement, indicating much better artist identification
|
121 |
+
- **Character Detection**: Large +7.7pp micro F1 and +13.8pp macro F1 gains
|
122 |
+
- **Copyright Recognition**: Excellent +7.4pp micro F1 improvement and +14.3pp macro improvement for series identification
|
123 |
+
- **General Tags**: Improved +5.6pp micro F1 and +13.1pp macro F1 for visual attributes
|
124 |
+
- **Overall Macro F1**: Exceptional +16.8pp improvement shows much better rare tag recognition
|
125 |
|
126 |
+
Only the year category shows slight regression.
|
127 |
|
128 |
### Detailed v2 Performance
|
129 |
|
|
|
211 |
- Streamlit
|
212 |
- PIL/Pillow
|
213 |
- NumPy
|
|
|
214 |
|
215 |
## π§ Usage
|
216 |
|