text
A DistilBERT-based 7-class text classifier fine-tuned to predict the music genre associated with a YouTube comment.
Inputs are raw comment strings; outputs are one of seven genre labels.
Base model:
distilbert-base-uncased
Results (evaluation set)
- Loss: 0.0675
- Accuracy: 1.0
- F1: 1.0
- Precision: 1.0
- Recall: 1.0
Training curves (from Trainer logs)
| Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 | Precision | Recall |
|---|---|---|---|---|---|---|---|
| 1.2677 | 1.0 | 84 | 1.0653 | 0.9107 | 0.9097 | 0.9147 | 0.9107 |
| 0.4341 | 2.0 | 168 | 0.3179 | 0.9821 | 0.9820 | 0.9829 | 0.9821 |
| 0.0963 | 3.0 | 252 | 0.0865 | 1.0 | 1.0 | 1.0 | 1.0 |
| 0.0568 | 4.0 | 336 | 0.0427 | 1.0 | 1.0 | 1.0 | 1.0 |
| 0.0414 | 5.0 | 420 | 0.0356 | 1.0 | 1.0 | 1.0 | 1.0 |
Note: Perfect scores may indicate an easy task, strong regularization, or possible data leakage. Validate on a held-out set and/or external data.
Model description
- Architecture: DistilBERT encoder with a linear classification head
- Task: Multi-class text classification (7 genres)
- Input: A single YouTube comment (
str) - Output: Predicted genre label + scores
Labels
Classical rock metal electronic R&B pop jazz
Intended uses & limitations
Intended uses
- Exploratory analysis of audience/genre engagement on music videos
- Routing comments to genre-specific moderation or analytics queues
- Downstream features (e.g., per-genre dashboards)
Limitations
- Trained on YouTube comments; may not generalize to other platforms/domains
- Genre labels reflect the training taxonomy; ambiguous or mixed-genre comments can be misclassified
- Not designed for toxicity, sentiment, or demographic inference
Ethical considerations
- Comments can contain personal data; ensure collection complies with platform ToS and privacy laws
- Avoid using predictions to profile individuals
How to use
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TextClassificationPipeline
repo_id = "scottymcgee/text-classifier" # update if different
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForSequenceClassification.from_pretrained(repo_id)
pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=False)
pipe("this chorus is so catchy, reminds me of late 90s production")
- Downloads last month
- 1
Model tree for scottymcgee/text
Base model
distilbert/distilbert-base-uncasedEvaluation results
- accuracy on YouTube Music Genre Comments (custom)validation set self-reported1.000
- f1 on YouTube Music Genre Comments (custom)validation set self-reported1.000
- precision on YouTube Music Genre Comments (custom)validation set self-reported1.000
- recall on YouTube Music Genre Comments (custom)validation set self-reported1.000