Camais03
/

camie-tagger-v2

@@ -48,32 +48,82 @@ An advanced deep learning model for automatically tagging anime/manga illustrati
 ## 📊 Performance Analysis
 ### Complete v1 vs v2 Performance Comparison
 | CATEGORY | v1 Micro F1 | v2 Micro F1 | Micro Δ | v1 Macro F1 | v2 Macro F1 | Macro Δ |
 |----------|-------------|-------------|---------|-------------|-------------|---------|
-| **Overall** | 58.1% | **67.3%** | **+9.2pp** | 31.5% | **50.6%** | **+19.1pp** |
-| **Artist** | 47.4% | **70.0%** | **+22.6pp** | 29.8% | **64.4%** | **+34.6pp** |
-| **Character** | 74.6% | **83.4%** | **+8.8pp** | 47.8% | **64.5%** | **+16.7pp** |
-| **Copyright** | 76.3% | **86.6%** | **+10.3pp** | 37.7% | **53.1%** | **+15.4pp** |
-| **General** | 57.6% | **66.4%** | **+8.8pp** | 20.4% | **27.4%** | **+7.0pp** |
-| **Meta** | 55.7% | **61.2%** | **+5.5pp** | 14.4% | **19.2%** | **+4.8pp** |
-| **Rating** | 77.9% | **83.1%** | **+5.2pp** | 76.8% | **81.8%** | **+5.0pp** |
-| **Year** | 33.1% | **30.8%** | **-2.3pp** | 28.6% | **21.3%** | **-7.3pp** |
-*Both using the balanced preset.
 ### Key Performance Insights
 The v2 model shows remarkable improvements across nearly all categories:
-- **Artist Recognition**: Massive +22.6pp micro F1 improvement, indicating much better artist identification
-- **Character Detection**: Strong +8.8pp micro F1 and +16.7pp macro F1 gains
-- **Copyright Recognition**: Excellent +10.3pp micro F1 improvement for series identification
-- **General Tags**: Consistent +8.8pp micro F1 improvement for visual attributes
-- **Overall Macro F1**: Exceptional +19.1pp improvement shows much better rare tag recognition
-Only the year category shows slight regression, likely due to the reduced model complexity making temporal classification more challenging.
 ### Detailed v2 Performance
@@ -161,7 +211,6 @@ The model was trained using an innovative multi-resolution approach:
 - Streamlit
 - PIL/Pillow
 - NumPy
-- Flash Attention (note: doesn't work properly on Windows only needed for refined model which I'm not supporting that much anyway)
 ## 🔧 Usage

 ## 📊 Performance Analysis
+---
+license: gpl-3.0
+datasets:
+- p1atdev/danbooru-2024
+language:
+- en
+pipeline_tag: image-classification
+---
+# Camie Tagger v2
+An advanced deep learning model for automatically tagging anime/manga illustrations with relevant tags across multiple categories, achieving **67.3% micro F1 score** (50.6% macro F1 score using the macro optimized threshold preset) across 70,527 possible tags on a test set of 20,116 samples. Now with Vision Transformer backbone and significantly improved performance.
+## 🚀 What's New in v2
+### Major Performance Improvements
+- **Micro F1**: 58.1% → **67.3%** (+9.2 percentage points)
+- **Macro F1**: 31.5% → **50.6%** (+19.1 percentage points)
+- **Model Size**: 424M → **143M parameters** (-66% reduction)
+- **Architecture**: Switched from EfficientNetV2-L to Vision Transformer (ViT) backbone
+- **Simplified Design**: Streamlined from dual-stage to single refined prediction model
+### Training Innovations
+- **Multi-Resolution Training**: Progressive scaling from 384px → 512px resolution
+- **IRFS (Instance-Aware Repeat Factor Sampling)**: Significant macro F1 improvements for rare tags
+- **Adaptive Training**: Models quickly adapt to resolution/distribution changes after initial pretraining
+*v2 demonstrates that Vision Transformers can achieve superior anime image tagging performance with fewer parameters and cleaner architecture.*
+## 🔑 Key Highlights
+- **Efficient Training**: Completed on just a single RTX 3060 GPU (12GB VRAM)
+- **Fast Adaptation**: Models adapt to new resolutions/distributions within partial epochs after pretraining
+- **Comprehensive Coverage**: 70,527 tags across 7 categories (general, character, copyright, artist, meta, rating, year)
+- **Modern Architecture**: Vision Transformer backbone with cross-attention refinement
+- **User-Friendly Interface**: Easy-to-use application with customizable thresholds and tag collection game
+## ✨ Features
+- **Multi-category tagging system**: Handles general tags, characters, copyright (series), artists, meta information, and content ratings
+- **High performance**: 67.3% micro F1 score (50.6% macro F1) across 70,527 possible tags
+- **Windows compatibility**: Works on Windows without Flash Attention requirements
+- **Streamlit web interface**: User-friendly UI for uploading and analyzing images and a tag collection game
+- **Adjustable threshold profiles**: Micro, Macro, Balanced, Category-specific, High Precision, and High Recall profiles
+- **Fine-grained control**: Per-category threshold adjustments for precision-recall tradeoffs
+- **Safetensors and ONNX**: Original pickle files available in /models
+- **Vision Transformer Backbone**: Modern architecture with superior performance-to-parameter ratio
+## 📊 Performance Analysis
 ### Complete v1 vs v2 Performance Comparison
 | CATEGORY | v1 Micro F1 | v2 Micro F1 | Micro Δ | v1 Macro F1 | v2 Macro F1 | Macro Δ |
 |----------|-------------|-------------|---------|-------------|-------------|---------|
+| **Overall** | 61.3% | **67.3%** | **+6.0pp** | 33.8% | **50.6%** | **+16.8pp** |
+| **Artist** | 48.0% | **70.0%** | **+22.0pp** | 29.9% | **66.1%** | **+36.2pp** |
+| **Character** | 75.7% | **83.4%** | **+7.7pp** | 52.4% | **66.2%** | **+13.8pp** |
+| **Copyright** | 79.2% | **86.6%** | **+7.4pp** | 41.9% | **56.2%** | **+14.3pp** |
+| **General** | 60.8% | **66.4%** | **+5.6pp** | 21.5% | **34.6%** | **+13.1pp** |
+| **Meta** | 60.2% | **61.2%** | **+1.0pp** | 14.5% | **23.7%** | **+9.2pp** |
+| **Rating** | 80.8% | **83.1%** | **+2.3pp** | 79.5% | **77.5%** | **-2.0pp** |
+| **Year** | 33.2% | **30.8%** | **-2.4pp** | 29.3% | **32.6%** | **+3.3pp** |
+*Micro F1 comparison using micro-optimized thresholds, Macro F1 comparison using macro-optimized thresholds for fair evaluation.*
 ### Key Performance Insights
 The v2 model shows remarkable improvements across nearly all categories:
+- **Artist Recognition**: Massive +22.0pp micro F1 improvement and +36.2pp macro improvement, indicating much better artist identification
+- **Character Detection**: Large +7.7pp micro F1 and +13.8pp macro F1 gains
+- **Copyright Recognition**: Excellent +7.4pp micro F1 improvement and +14.3pp macro improvement for series identification
+- **General Tags**: Improved +5.6pp micro F1 and +13.1pp macro F1 for visual attributes
+- **Overall Macro F1**: Exceptional +16.8pp improvement shows much better rare tag recognition
+Only the year category shows slight regression.
 ### Detailed v2 Performance
 - Streamlit
 - PIL/Pillow
 - NumPy
 ## 🔧 Usage