CTRL44 Simplification model

This is a pretrained version of the controllable simplification model presented in the NAACL 2022 paper "Controllable Sentence Simplification via Operation Classification". It was trained on the IRSD simplification dataset.

A control token is expected at the start of input sequences to dictate which simplification operation should be performed. This can either be done manually or with an operation classifier like this one.

Possible control tokens are: "<ident>", "<para>", "<ssplit>", and "<dsplit>".

How to use

Here is how to use this model in PyTorch:

from transformers import BartForConditionalGeneration, AutoTokenizer

model = BartForConditionalGeneration.from_pretrained("liamcripwell/ctrl44-simp")
tokenizer = AutoTokenizer.from_pretrained("liamcripwell/ctrl44-simp")

text = "<para> Barack Hussein Obama II is an American politician who served as the 44th president of the United States from 2009 to 2017."
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, num_beams=10, max_length=128)
Downloads last month
22
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.