|
|
--- |
|
|
library_name: transformers |
|
|
datasets: |
|
|
- s-nlp/synthdetoxm |
|
|
- textdetox/multilingual_paradetox |
|
|
base_model: |
|
|
- bigscience/mt0-large |
|
|
pipeline_tag: text2text-generation |
|
|
license: mit |
|
|
language: |
|
|
- am |
|
|
- ar |
|
|
- de |
|
|
- en |
|
|
- es |
|
|
- fr |
|
|
- he |
|
|
- hi |
|
|
- it |
|
|
- ja |
|
|
- ru |
|
|
- tt |
|
|
- uk |
|
|
- zh |
|
|
--- |
|
|
|
|
|
# Model Card for Model ID |
|
|
|
|
|
This model (Voronin et al., 2025, TBA) is one of four model types developed during CLEF-2025 Multilingual Text Detoxification contest. The idea was to apply a |
|
|
Sage-T5-like approach for text detoxification tasks. The main model utilizes three loss functions: |
|
|
- seq2seq loss for paraphrase generations, |
|
|
- classification loss for token-level toxicity detection, |
|
|
- contrastive loss for improved semantic representation learning. |
|
|
|
|
|
To evaluate the correctness of the approach, backbone of mT0-large was taken and four models were trained: with only seq2seq loss, seq2seq & classification losses, |
|
|
seq2seq & contrastive losses and all three losses. This final model employs all three described losses. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
|
|
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. |
|
|
|
|
|
- **Developed by:** Alexandr Voronin, Nikita Sushko, Daniil Moskovsky |
|
|
- **Model type:** mT0-large |
|
|
- **Language(s) (NLP):** am, ar, de, en, es, fr, he, hi, hin, it, ja, ru, tt, uk, zh |
|
|
- **License:** MIT |
|
|
- **Finetuned from model [optional]:** mT0-large |
|
|
|
|
|
## Uses |
|
|
|
|
|
This model is intended to be used as a text detoxification task in 15 languages: Amharic, Arabic, German, English, Spanish, French, Hebrew, Hindi, Hinglish, Italian, Japanese, Russian, Tatar, Ukranian, Chinese. |
|
|
|
|
|
### Direct Use |
|
|
|
|
|
The model may be directly used for text detoxification tasks. |
|
|
|
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
|
|
```python |
|
|
import transformers |
|
|
|
|
|
pipe = transformers.pipeline('text2text-generation', 'alexandro767/SageDetox_detox_classification_contrastive') |
|
|
pipe('Rewrite in non-toxic way in Russian: Ненавижу блять C-GAN') |
|
|
``` |