text

A DistilBERT-based 7-class text classifier fine-tuned to predict the music genre associated with a YouTube comment.
Inputs are raw comment strings; outputs are one of seven genre labels.

Base model: distilbert-base-uncased

Results (evaluation set)

  • Loss: 0.0675
  • Accuracy: 1.0
  • F1: 1.0
  • Precision: 1.0
  • Recall: 1.0

Training curves (from Trainer logs)

Training Loss Epoch Step Validation Loss Accuracy F1 Precision Recall
1.2677 1.0 84 1.0653 0.9107 0.9097 0.9147 0.9107
0.4341 2.0 168 0.3179 0.9821 0.9820 0.9829 0.9821
0.0963 3.0 252 0.0865 1.0 1.0 1.0 1.0
0.0568 4.0 336 0.0427 1.0 1.0 1.0 1.0
0.0414 5.0 420 0.0356 1.0 1.0 1.0 1.0

Note: Perfect scores may indicate an easy task, strong regularization, or possible data leakage. Validate on a held-out set and/or external data.

Model description

  • Architecture: DistilBERT encoder with a linear classification head
  • Task: Multi-class text classification (7 genres)
  • Input: A single YouTube comment (str)
  • Output: Predicted genre label + scores

Labels

Classical rock metal electronic R&B pop jazz

Intended uses & limitations

Intended uses

  • Exploratory analysis of audience/genre engagement on music videos
  • Routing comments to genre-specific moderation or analytics queues
  • Downstream features (e.g., per-genre dashboards)

Limitations

  • Trained on YouTube comments; may not generalize to other platforms/domains
  • Genre labels reflect the training taxonomy; ambiguous or mixed-genre comments can be misclassified
  • Not designed for toxicity, sentiment, or demographic inference

Ethical considerations

  • Comments can contain personal data; ensure collection complies with platform ToS and privacy laws
  • Avoid using predictions to profile individuals

How to use

from transformers import AutoTokenizer, AutoModelForSequenceClassification, TextClassificationPipeline

repo_id = "scottymcgee/text-classifier"  # update if different
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForSequenceClassification.from_pretrained(repo_id)

pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=False)
pipe("this chorus is so catchy, reminds me of late 90s production")
Downloads last month
1
Safetensors
Model size
67M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for scottymcgee/text

Finetuned
(10020)
this model

Evaluation results