metadata

license: cc-by-nc-4.0
pipeline_tag: text-classification
library_name: transformers
language:
  - en
tags:
  - media-bias
  - lexical-bias
  - paper:2411.11081
  - naacl-2025
  - coreset
datasets:
  - mediabiasgroup/anno-lexical-coreset
base_model: roberta-base
model-index:
  - name: RoBERTa-SA-FT (Anno-lexical coreset)
    results:
      - task:
          type: text-classification
          name: Lexical bias detection
        dataset:
          name: BABE (test)
          type: mediabiasgroup/BABE
        metrics:
          - name: precision
            type: precision
            value: 0.829
          - name: recall
            type: recall
            value: 0.859
          - name: f1
            type: f1
            value: 0.844
          - name: mcc
            type: matthews_correlation
            value: 0.638
      - task:
          type: text-classification
          name: Lexical bias detection
        dataset:
          name: BASIL (all sentences)
          type: BASIL
        metrics:
          - name: precision
            type: precision
            value: 0.136
          - name: recall
            type: recall
            value: 0.696
          - name: f1
            type: f1
            value: 0.228
          - name: mcc
            type: matthews_correlation
            value: 0.201

RoBERTa-SA-FT (Anno-lexical coreset)

This model is a sentence-level media (lexical) bias classifier trained on the coreset (BABE-scale) subset of the Anno-lexical dataset from
“The Promises and Pitfalls of LLM Annotations in Dataset Labeling: a Case Study on Media Bias Detection” (NAACL Findings 2025; arXiv:2411.11081).
It’s a roberta-base encoder with a 2-layer classification head. Labels are: 0 = neutral/non-lexical-bias, 1 = lexical-bias.

Paper: The Promises and Pitfalls of LLM Annotations in Dataset Labeling: a Case Study on Media Bias Detection
Dataset: mediabiasgroup/anno-lexical-coreset

Intended use & limitations

Intended use: research on lexical/loaded-language bias; comparison to human-label fine-tuning under equal data size.
Out-of-scope: detection of non-lexical media bias forms (e.g., informational/selection bias), leaning, stance, factuality.
Known caveats: coreset-trained SA-FT tends to increase recall at the cost of precision compared to human-labeled FT.

How to use

from transformers import AutoTokenizer, AutoModelForSequenceClassification

m = "mediabiasgroup/roberta-anno-lexical-coreset-ft"
tok = AutoTokenizer.from_pretrained(m)
model = AutoModelForSequenceClassification.from_pretrained(m)

Training data & setup

Training data: Anno-lexical coreset (BABE-scale) with binary labels aggregated from LLM annotations.

Base encoder: roberta-base; head: 2-layer classifier.

Hardware: single A100; single-run training.

Evaluation

BABE (test): P 0.829 / R 0.859 / F1 0.844 / MCC 0.638

BASIL (all): P 0.136 / R 0.696 / F1 0.228 / MCC 0.201 (Positive class = lexical bias; BASIL informational bias treated as neutral.)

Safety, bias & ethics

Media bias perception is subjective and culturally dependent. This model may over-flag biased wording and should not be used to penalize individuals or outlets. Use with human-in-the-loop review and domain-specific calibration.

Citation

If you use this model, please cite:

@inproceedings{horych-etal-2025-promises,
  title = "The Promises and Pitfalls of {LLM} Annotations in Dataset Labeling: a Case Study on Media Bias Detection",
  author = "Horych, Tom{\'a}{\v{s}}  and
    Mandl, Christoph  and
    Ruas, Terry  and
    Greiner-Petter, Andre  and
    Gipp, Bela  and
    Aizawa, Akiko  and
    Spinde, Timo",
  editor = "Chiruzzo, Luis  and
    Ritter, Alan  and
    Wang, Lu",
  booktitle = "Findings of the Association for Computational Linguistics: NAACL 2025",
  month = apr,
  year = "2025",
  address = "Albuquerque, New Mexico",
  publisher = "Association for Computational Linguistics",
  url = "https://aclanthology.org/2025.findings-naacl.75/",
  doi = "10.18653/v1/2025.findings-naacl.75",
  pages = "1370--1386",
  ISBN = "979-8-89176-195-7"
}