s-nlp/roberta-base-formality-ranker

The model has been trained to predict for English sentences, whether they are formal or informal.

Base model: roberta-base

Datasets: GYAFC from Rao and Tetreault, 2018 and online formality corpus from Pavlick and Tetreault, 2016.

Data augmentation: changing texts to upper or lower case; removing all punctuation, adding dot at the end of a sentence. It was applied because otherwise the model is over-reliant on punctuation and capitalization and does not pay enough attention to other features.

Loss: binary classification (on GYAFC), in-batch ranking (on PT data).

Performance metrics on the test data:

dataset	ROC AUC	precision	recall	fscore	accuracy	Spearman
GYAFC	0.9779	0.90	0.91	0.90	0.9087	0.8233
GYAFC normalized (lowercase + remove punct.)	0.9234	0.85	0.81	0.82	0.8218	0.7294

P&T subset	Spearman R
news	0.4003
answers	0.7500
blog	0.7334
email	0.7606

Citation

If you are using the model in your research, please cite the following paper where it was introduced:

@InProceedings{10.1007/978-3-031-35320-8_4,
  author="Babakov, Nikolay
  and Dale, David
  and Gusev, Ilya
  and Krotova, Irina
  and Panchenko, Alexander",
  editor="M{\'e}tais, Elisabeth
  and Meziane, Farid
  and Sugumaran, Vijayan
  and Manning, Warren
  and Reiff-Marganiec, Stephan",
  title="Don't Lose the Message While Paraphrasing: A Study on Content Preserving Style Transfer",
  booktitle="Natural Language Processing and Information Systems",
  year="2023",
  publisher="Springer Nature Switzerland",
  address="Cham",
  pages="47--61",
  isbn="978-3-031-35320-8"
}

Licensing Information

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

s-nlp
/

roberta-base-formality-ranker

Citation

Licensing Information

Spaces using s-nlp/roberta-base-formality-ranker 4