Update README.md

2d69ea8 verified 4 months ago

2.04 kB

metadata

library_name: transformers
datasets:
  - s-nlp/synthdetoxm
  - textdetox/multilingual_paradetox
base_model:
  - bigscience/mt0-large
pipeline_tag: text2text-generation
license: mit
language:
  - am
  - ar
  - de
  - en
  - es
  - fr
  - he
  - hi
  - it
  - ja
  - ru
  - tt
  - uk
  - zh

Model Card for Model ID

This model (Voronin et al., 2025, TBA) is one of four model types developed during CLEF-2025 Multilingual Text Detoxification contest. The idea was to apply a Sage-T5-like approach for text detoxification tasks. The main model utilizes three loss functions:

seq2seq loss for paraphrase generations,
classification loss for token-level toxicity detection,
contrastive loss for improved semantic representation learning.

To evaluate the correctness of the approach, backbone of mT0-large was taken and four models were trained: with only seq2seq loss, seq2seq & classification losses, seq2seq & contrastive losses and all three losses. This final model employs all three described losses.

Model Details

Model Description

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

Developed by: Alexandr Voronin, Nikita Sushko, Daniil Moskovsky
Model type: mT0-large
Language(s) (NLP): am, ar, de, en, es, fr, he, hi, hin, it, ja, ru, tt, uk, zh
License: MIT
Finetuned from model [optional]: mT0-large

Uses

This model is intended to be used as a text detoxification task in 15 languages: Amharic, Arabic, German, English, Spanish, French, Hebrew, Hindi, Hinglish, Italian, Japanese, Russian, Tatar, Ukranian, Chinese.

Direct Use

The model may be directly used for text detoxification tasks.

How to Get Started with the Model

import transformers

pipe = transformers.pipeline('text2text-generation', 'alexandro767/SageDetox_detox_classification_contrastive')
pipe('Rewrite in non-toxic way in Russian: Ненавижу блять C-GAN')