alexandro767
/

SageDetox_detox_classification_contrastive

Text Generation

text2text-generation

Model card Files Files and versions

SageDetox_detox_classification_contrastive / README.md

alexandro767's picture

Update README.md

2d69ea8 verified 4 months ago

|

history blame contribute delete

2.04 kB

	---
	library_name: transformers
	datasets:
	- s-nlp/synthdetoxm
	- textdetox/multilingual_paradetox
	base_model:
	- bigscience/mt0-large
	pipeline_tag: text2text-generation
	license: mit
	language:
	- am
	- ar
	- de
	- en
	- es
	- fr
	- he
	- hi
	- it
	- ja
	- ru
	- tt
	- uk
	- zh
	---

	# Model Card for Model ID

	This model (Voronin et al., 2025, TBA) is one of four model types developed during CLEF-2025 Multilingual Text Detoxification contest. The idea was to apply a
	Sage-T5-like approach for text detoxification tasks. The main model utilizes three loss functions:
	- seq2seq loss for paraphrase generations,
	- classification loss for token-level toxicity detection,
	- contrastive loss for improved semantic representation learning.

	To evaluate the correctness of the approach, backbone of mT0-large was taken and four models were trained: with only seq2seq loss, seq2seq & classification losses,
	seq2seq & contrastive losses and all three losses. This final model employs all three described losses.




	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->

	This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

	- Developed by: Alexandr Voronin, Nikita Sushko, Daniil Moskovsky
	- Model type: mT0-large
	- Language(s) (NLP): am, ar, de, en, es, fr, he, hi, hin, it, ja, ru, tt, uk, zh
	- License: MIT
	- Finetuned from model [optional]: mT0-large

	## Uses

	This model is intended to be used as a text detoxification task in 15 languages: Amharic, Arabic, German, English, Spanish, French, Hebrew, Hindi, Hinglish, Italian, Japanese, Russian, Tatar, Ukranian, Chinese.

	### Direct Use

	The model may be directly used for text detoxification tasks.


	## How to Get Started with the Model

	```python
	import transformers

	pipe = transformers.pipeline('text2text-generation', 'alexandro767/SageDetox_detox_classification_contrastive')
	pipe('Rewrite in non-toxic way in Russian: Ненавижу блять C-GAN')
	```