alexandro767's picture
Update README.md
2d69ea8 verified
---
library_name: transformers
datasets:
- s-nlp/synthdetoxm
- textdetox/multilingual_paradetox
base_model:
- bigscience/mt0-large
pipeline_tag: text2text-generation
license: mit
language:
- am
- ar
- de
- en
- es
- fr
- he
- hi
- it
- ja
- ru
- tt
- uk
- zh
---
# Model Card for Model ID
This model (Voronin et al., 2025, TBA) is one of four model types developed during CLEF-2025 Multilingual Text Detoxification contest. The idea was to apply a
Sage-T5-like approach for text detoxification tasks. The main model utilizes three loss functions:
- seq2seq loss for paraphrase generations,
- classification loss for token-level toxicity detection,
- contrastive loss for improved semantic representation learning.
To evaluate the correctness of the approach, backbone of mT0-large was taken and four models were trained: with only seq2seq loss, seq2seq & classification losses,
seq2seq & contrastive losses and all three losses. This final model employs all three described losses.
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
- **Developed by:** Alexandr Voronin, Nikita Sushko, Daniil Moskovsky
- **Model type:** mT0-large
- **Language(s) (NLP):** am, ar, de, en, es, fr, he, hi, hin, it, ja, ru, tt, uk, zh
- **License:** MIT
- **Finetuned from model [optional]:** mT0-large
## Uses
This model is intended to be used as a text detoxification task in 15 languages: Amharic, Arabic, German, English, Spanish, French, Hebrew, Hindi, Hinglish, Italian, Japanese, Russian, Tatar, Ukranian, Chinese.
### Direct Use
The model may be directly used for text detoxification tasks.
## How to Get Started with the Model
```python
import transformers
pipe = transformers.pipeline('text2text-generation', 'alexandro767/SageDetox_detox_classification_contrastive')
pipe('Rewrite in non-toxic way in Russian: Ненавижу блять C-GAN')
```