text

A DistilBERT-based 7-class text classifier fine-tuned to predict the music genre associated with a YouTube comment.
Inputs are raw comment strings; outputs are one of seven genre labels.

Base model: distilbert-base-uncased

Results (evaluation set)

Loss: 0.0675
Accuracy: 1.0
F1: 1.0
Precision: 1.0
Recall: 1.0

Training curves (from `Trainer` logs)

Training Loss	Epoch	Step	Validation Loss	Accuracy	F1	Precision	Recall
1.2677	1.0	84	1.0653	0.9107	0.9097	0.9147	0.9107
0.4341	2.0	168	0.3179	0.9821	0.9820	0.9829	0.9821
0.0963	3.0	252	0.0865	1.0	1.0	1.0	1.0
0.0568	4.0	336	0.0427	1.0	1.0	1.0	1.0
0.0414	5.0	420	0.0356	1.0	1.0	1.0	1.0

Note: Perfect scores may indicate an easy task, strong regularization, or possible data leakage. Validate on a held-out set and/or external data.

Model description

Architecture: DistilBERT encoder with a linear classification head
Task: Multi-class text classification (7 genres)
Input: A single YouTube comment (str)
Output: Predicted genre label + scores

Labels

Classical rock metal electronic R&B pop jazz

Intended uses & limitations

Intended uses

Exploratory analysis of audience/genre engagement on music videos
Routing comments to genre-specific moderation or analytics queues
Downstream features (e.g., per-genre dashboards)

Limitations

Trained on YouTube comments; may not generalize to other platforms/domains
Genre labels reflect the training taxonomy; ambiguous or mixed-genre comments can be misclassified
Not designed for toxicity, sentiment, or demographic inference

Ethical considerations

Comments can contain personal data; ensure collection complies with platform ToS and privacy laws
Avoid using predictions to profile individuals

How to use

from transformers import AutoTokenizer, AutoModelForSequenceClassification, TextClassificationPipeline

repo_id = "scottymcgee/text-classifier"  # update if different
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForSequenceClassification.from_pretrained(repo_id)

pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=False)
pipe("this chorus is so catchy, reminds me of late 90s production")

Downloads last month: 1

Safetensors

Model size

67M params

Tensor type

F32

Model tree for scottymcgee/text

Base model

distilbert/distilbert-base-uncased

Finetuned

(10020)

this model

Evaluation results

accuracy on YouTube Music Genre Comments (custom)
validation set self-reported

1.000
f1 on YouTube Music Genre Comments (custom)
validation set self-reported

1.000
precision on YouTube Music Genre Comments (custom)
validation set self-reported

1.000
recall on YouTube Music Genre Comments (custom)
validation set self-reported

1.000

View on Papers With Code