Update README.md
Browse files
README.md
CHANGED
|
@@ -48,32 +48,82 @@ An advanced deep learning model for automatically tagging anime/manga illustrati
|
|
| 48 |
|
| 49 |
## π Performance Analysis
|
| 50 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 51 |
### Complete v1 vs v2 Performance Comparison
|
| 52 |
|
| 53 |
| CATEGORY | v1 Micro F1 | v2 Micro F1 | Micro Ξ | v1 Macro F1 | v2 Macro F1 | Macro Ξ |
|
| 54 |
|----------|-------------|-------------|---------|-------------|-------------|---------|
|
| 55 |
-
| **Overall** |
|
| 56 |
-
| **Artist** |
|
| 57 |
-
| **Character** |
|
| 58 |
-
| **Copyright** |
|
| 59 |
-
| **General** |
|
| 60 |
-
| **Meta** |
|
| 61 |
-
| **Rating** |
|
| 62 |
-
| **Year** | 33.
|
| 63 |
|
| 64 |
-
*
|
| 65 |
|
| 66 |
### Key Performance Insights
|
| 67 |
|
| 68 |
The v2 model shows remarkable improvements across nearly all categories:
|
| 69 |
|
| 70 |
-
- **Artist Recognition**: Massive +22.
|
| 71 |
-
- **Character Detection**:
|
| 72 |
-
- **Copyright Recognition**: Excellent +
|
| 73 |
-
- **General Tags**:
|
| 74 |
-
- **Overall Macro F1**: Exceptional +
|
| 75 |
|
| 76 |
-
Only the year category shows slight regression
|
| 77 |
|
| 78 |
### Detailed v2 Performance
|
| 79 |
|
|
@@ -161,7 +211,6 @@ The model was trained using an innovative multi-resolution approach:
|
|
| 161 |
- Streamlit
|
| 162 |
- PIL/Pillow
|
| 163 |
- NumPy
|
| 164 |
-
- Flash Attention (note: doesn't work properly on Windows only needed for refined model which I'm not supporting that much anyway)
|
| 165 |
|
| 166 |
## π§ Usage
|
| 167 |
|
|
|
|
| 48 |
|
| 49 |
## π Performance Analysis
|
| 50 |
|
| 51 |
+
---
|
| 52 |
+
license: gpl-3.0
|
| 53 |
+
datasets:
|
| 54 |
+
- p1atdev/danbooru-2024
|
| 55 |
+
language:
|
| 56 |
+
- en
|
| 57 |
+
pipeline_tag: image-classification
|
| 58 |
+
---
|
| 59 |
+
|
| 60 |
+
# Camie Tagger v2
|
| 61 |
+
|
| 62 |
+
An advanced deep learning model for automatically tagging anime/manga illustrations with relevant tags across multiple categories, achieving **67.3% micro F1 score** (50.6% macro F1 score using the macro optimized threshold preset) across 70,527 possible tags on a test set of 20,116 samples. Now with Vision Transformer backbone and significantly improved performance.
|
| 63 |
+
|
| 64 |
+
## π What's New in v2
|
| 65 |
+
|
| 66 |
+
### Major Performance Improvements
|
| 67 |
+
- **Micro F1**: 58.1% β **67.3%** (+9.2 percentage points)
|
| 68 |
+
- **Macro F1**: 31.5% β **50.6%** (+19.1 percentage points)
|
| 69 |
+
- **Model Size**: 424M β **143M parameters** (-66% reduction)
|
| 70 |
+
- **Architecture**: Switched from EfficientNetV2-L to Vision Transformer (ViT) backbone
|
| 71 |
+
- **Simplified Design**: Streamlined from dual-stage to single refined prediction model
|
| 72 |
+
|
| 73 |
+
### Training Innovations
|
| 74 |
+
- **Multi-Resolution Training**: Progressive scaling from 384px β 512px resolution
|
| 75 |
+
- **IRFS (Instance-Aware Repeat Factor Sampling)**: Significant macro F1 improvements for rare tags
|
| 76 |
+
- **Adaptive Training**: Models quickly adapt to resolution/distribution changes after initial pretraining
|
| 77 |
+
|
| 78 |
+
*v2 demonstrates that Vision Transformers can achieve superior anime image tagging performance with fewer parameters and cleaner architecture.*
|
| 79 |
+
|
| 80 |
+
## π Key Highlights
|
| 81 |
+
|
| 82 |
+
- **Efficient Training**: Completed on just a single RTX 3060 GPU (12GB VRAM)
|
| 83 |
+
- **Fast Adaptation**: Models adapt to new resolutions/distributions within partial epochs after pretraining
|
| 84 |
+
- **Comprehensive Coverage**: 70,527 tags across 7 categories (general, character, copyright, artist, meta, rating, year)
|
| 85 |
+
- **Modern Architecture**: Vision Transformer backbone with cross-attention refinement
|
| 86 |
+
- **User-Friendly Interface**: Easy-to-use application with customizable thresholds and tag collection game
|
| 87 |
+
|
| 88 |
+
## β¨ Features
|
| 89 |
+
|
| 90 |
+
- **Multi-category tagging system**: Handles general tags, characters, copyright (series), artists, meta information, and content ratings
|
| 91 |
+
- **High performance**: 67.3% micro F1 score (50.6% macro F1) across 70,527 possible tags
|
| 92 |
+
- **Windows compatibility**: Works on Windows without Flash Attention requirements
|
| 93 |
+
- **Streamlit web interface**: User-friendly UI for uploading and analyzing images and a tag collection game
|
| 94 |
+
- **Adjustable threshold profiles**: Micro, Macro, Balanced, Category-specific, High Precision, and High Recall profiles
|
| 95 |
+
- **Fine-grained control**: Per-category threshold adjustments for precision-recall tradeoffs
|
| 96 |
+
- **Safetensors and ONNX**: Original pickle files available in /models
|
| 97 |
+
- **Vision Transformer Backbone**: Modern architecture with superior performance-to-parameter ratio
|
| 98 |
+
|
| 99 |
+
## π Performance Analysis
|
| 100 |
+
|
| 101 |
### Complete v1 vs v2 Performance Comparison
|
| 102 |
|
| 103 |
| CATEGORY | v1 Micro F1 | v2 Micro F1 | Micro Ξ | v1 Macro F1 | v2 Macro F1 | Macro Ξ |
|
| 104 |
|----------|-------------|-------------|---------|-------------|-------------|---------|
|
| 105 |
+
| **Overall** | 61.3% | **67.3%** | **+6.0pp** | 33.8% | **50.6%** | **+16.8pp** |
|
| 106 |
+
| **Artist** | 48.0% | **70.0%** | **+22.0pp** | 29.9% | **66.1%** | **+36.2pp** |
|
| 107 |
+
| **Character** | 75.7% | **83.4%** | **+7.7pp** | 52.4% | **66.2%** | **+13.8pp** |
|
| 108 |
+
| **Copyright** | 79.2% | **86.6%** | **+7.4pp** | 41.9% | **56.2%** | **+14.3pp** |
|
| 109 |
+
| **General** | 60.8% | **66.4%** | **+5.6pp** | 21.5% | **34.6%** | **+13.1pp** |
|
| 110 |
+
| **Meta** | 60.2% | **61.2%** | **+1.0pp** | 14.5% | **23.7%** | **+9.2pp** |
|
| 111 |
+
| **Rating** | 80.8% | **83.1%** | **+2.3pp** | 79.5% | **77.5%** | **-2.0pp** |
|
| 112 |
+
| **Year** | 33.2% | **30.8%** | **-2.4pp** | 29.3% | **32.6%** | **+3.3pp** |
|
| 113 |
|
| 114 |
+
*Micro F1 comparison using micro-optimized thresholds, Macro F1 comparison using macro-optimized thresholds for fair evaluation.*
|
| 115 |
|
| 116 |
### Key Performance Insights
|
| 117 |
|
| 118 |
The v2 model shows remarkable improvements across nearly all categories:
|
| 119 |
|
| 120 |
+
- **Artist Recognition**: Massive +22.0pp micro F1 improvement and +36.2pp macro improvement, indicating much better artist identification
|
| 121 |
+
- **Character Detection**: Large +7.7pp micro F1 and +13.8pp macro F1 gains
|
| 122 |
+
- **Copyright Recognition**: Excellent +7.4pp micro F1 improvement and +14.3pp macro improvement for series identification
|
| 123 |
+
- **General Tags**: Improved +5.6pp micro F1 and +13.1pp macro F1 for visual attributes
|
| 124 |
+
- **Overall Macro F1**: Exceptional +16.8pp improvement shows much better rare tag recognition
|
| 125 |
|
| 126 |
+
Only the year category shows slight regression.
|
| 127 |
|
| 128 |
### Detailed v2 Performance
|
| 129 |
|
|
|
|
| 211 |
- Streamlit
|
| 212 |
- PIL/Pillow
|
| 213 |
- NumPy
|
|
|
|
| 214 |
|
| 215 |
## π§ Usage
|
| 216 |
|