IceBERT Bias-Aware NER (Icelandic)

Trigger warning: This model detects biased, offensive, or harmful language. Examples in this card may contain such language, included solely for research purposes.

Model Description

This is a fine-tuned version of IceBERT for Named Entity Recognition (NER) to identify biased and potentially harmful expressions in Icelandic text.
It was trained on automatically annotated sentences covering multiple social bias categories. The covered classes are the following:

B-ADDICTION, I-ADDICTION
B-DISABILITY, I-DISABILITY
B-ORIGIN, I-ORIGIN
B-GENERAL, I-GENERAL
B-LGBTQIA, I-LGBTQIA
B-LOOKS, I-LOOKS
B-PERSONAL, I-PERSONAL
B-PROFANITY, I-PROFANITY
B-RELIGION, I-RELIGION
B-SEXUAL, I-SEXUAL
B-SOCIAL_STATUS, I-SOCIAL_STATUS
B-STUPIDITY, I-STUPIDITY
B-VULGAR, I-VULGAR
B-WOMEN, I-WOMEN

The model flags words or phrases belonging to these categories, producing BIO tags (e.g., B-WOMEN, I-WOMEN, O).

Intended Uses & Limitations

Intended Use

Research on bias detection in low-resource languages
Educational tools for raising awareness of bias in language
Civic engagement platforms encouraging inclusive language

Limitations

Vocabulary-based weak supervision means some bias forms may be missed
No sentence-level or discourse-level interpretation
Mislabeling possible in critical, reclaimed, or journalistic contexts

⚠ Not intended for punitive monitoring or censorship. Outputs are prompts for reflection, not judgments.

Performance

Evaluation datasets:

Test set: 15,383 automatically annotated sentences (silver data)
Gold set: 190 manually reviewed sentences

Macro F1 performance highlights:

Test set: 0.970 (CI: 0.970-0.971)
Gold set: 0.868 (CI: 0.867-0.869)

Relevant Information

Base model: IceBERT
Data source: IceBiasNER

Ethical Considerations

This model is released under the BigScience OpenRAIL-M License, which allows free use with responsible-use restrictions.
Prohibited uses include:

Harassment or discrimination
Generating disinformation or hateful content
Surveillance targeting individuals or groups

Citation

Will be updated.

Downloads last month: 14

Safetensors

Model size

124M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support