IceBERT Bias-Aware NER (Icelandic)

Trigger warning: This model detects biased, offensive, or harmful language. Examples in this card may contain such language, included solely for research purposes.

Model Description

This is a fine-tuned version of IceBERT for Named Entity Recognition (NER) to identify biased and potentially harmful expressions in Icelandic text.
It was trained on automatically annotated sentences covering multiple social bias categories. The covered classes are the following:

  • B-ADDICTION, I-ADDICTION
  • B-DISABILITY, I-DISABILITY
  • B-ORIGIN, I-ORIGIN
  • B-GENERAL, I-GENERAL
  • B-LGBTQIA, I-LGBTQIA
  • B-LOOKS, I-LOOKS
  • B-PERSONAL, I-PERSONAL
  • B-PROFANITY, I-PROFANITY
  • B-RELIGION, I-RELIGION
  • B-SEXUAL, I-SEXUAL
  • B-SOCIAL_STATUS, I-SOCIAL_STATUS
  • B-STUPIDITY, I-STUPIDITY
  • B-VULGAR, I-VULGAR
  • B-WOMEN, I-WOMEN

The model flags words or phrases belonging to these categories, producing BIO tags (e.g., B-WOMEN, I-WOMEN, O).

Intended Uses & Limitations

Intended Use

  • Research on bias detection in low-resource languages
  • Educational tools for raising awareness of bias in language
  • Civic engagement platforms encouraging inclusive language

Limitations

  • Vocabulary-based weak supervision means some bias forms may be missed
  • No sentence-level or discourse-level interpretation
  • Mislabeling possible in critical, reclaimed, or journalistic contexts

Not intended for punitive monitoring or censorship. Outputs are prompts for reflection, not judgments.

Performance

Evaluation datasets:

  • Test set: 15,383 automatically annotated sentences (silver data)
  • Gold set: 190 manually reviewed sentences

Macro F1 performance highlights:

  • Test set: 0.970 (CI: 0.970-0.971)
  • Gold set: 0.868 (CI: 0.867-0.869)

Relevant Information

Ethical Considerations

This model is released under the BigScience OpenRAIL-M License, which allows free use with responsible-use restrictions.
Prohibited uses include:

  • Harassment or discrimination
  • Generating disinformation or hateful content
  • Surveillance targeting individuals or groups

Citation

Will be updated.


Downloads last month
14
Safetensors
Model size
124M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support