Model Card for Model ID

This is one of the models fine-tuned on text simplification for Simplify This project.

Model Details

Model Description

Fine-tuned sequence-to-sequence (encoder–decoder) Transformer for English text simplification.
Trained on the dataset eilamc14/wikilarge-clean (cleaned WikiLarge-style pairs).

Model type: Seq2Seq Transformer (encoder–decoder)
Language (NLP): English
License: apache-2.0
Finetuned from model: google/pegasus-xsum

Model Sources

Repository (code): https://github.com/eilamc14/Simplify-This
Dataset: https://huggingface.co/datasets/eilamc14/wikilarge-clean
Paper [optional]: —
Demo [optional]: —

Uses

Direct Use

The model is intended for English text simplification.

Input format: Simplify: <complex sentence>
Output: <simplified sentence>

Typical uses

Research on automatic text simplification
Benchmarking against other simplification systems
Demos/prototypes that require simpler English rewrites

Downstream Use

This repository already contains a fine-tuned model specialized for text simplification.

Further fine-tuning is optional and mainly relevant when:

Adapting to a markedly different domain (e.g., medical/legal/news)
Addressing specific failure modes (e.g., over/under-simplification, factual drops)
Distilling/quantizing for deployment constraints

When fine-tuning further, keep the same input convention: Simplify: <...>.

Out-of-Scope Use

Not intended for:

Tasks unrelated to simplification (dialogue, translation etc.)
Production use without additional safety filtering (no toxicity/bias mitigation)
Languages other than English
High-stakes settings (legal/medical advice, safety-critical decisions)

Bias, Risks, and Limitations

The model was trained on Wikipedia and Simple English Wikipedia alignments (via WikiLarge).
As a result, it inherits the characteristics and limitations of this data:

Domain bias: Simplifications may reflect encyclopedic style; performance may degrade on informal, technical, or domain-specific text (e.g., medical/legal/news).
Content bias: Wikipedia content itself contains biases in coverage, cultural perspective, and phrasing. Simplified outputs may reflect or amplify these.
Simplification quality: The model may:
- Over-simplify (drop important details)
- Under-simplify (retain complex phrasing)
- Produce ungrammatical or awkward rephrasings
Language limitation: Only suitable for English. Applying to other languages is unsupported.
Safety limitation: The model has not been aligned to avoid toxic, biased, or harmful content. If the input text contains such content, the output may reproduce or modify it without safeguards.

Recommendations

Evaluation required: Always evaluate the model in the target domain before deployment. Benchmark simplification quality (e.g., with SARI, FKGL, BERTScore, LENS, human evaluation).
Human oversight: Use human-in-the-loop review for applications where meaning preservation is critical (education, accessibility tools, etc.).
Attribution: Preserve source attribution where required (Wikipedia → CC BY-SA).
Not for high-stakes use: Avoid legal, medical, or safety-critical applications without extensive validation and domain adaptation.

How to Get Started with the Model

Load the model and tokenizer directly from the Hugging Face Hub:

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model_id = "eilamc14/bart-base-text-simplification"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)

# Example input
PREFIX = "Simplify: "
text = "The committee deemed the proposal unnecessarily complicated."

# Tokenize and generate
inputs = tokenizer(PREFIX+text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=64, num_beams=4)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

Training Data

WikiLarge-clean Dataset

Training Procedure

Hardware: NVIDIA L4 GPU on Google Colab
Objective: Standard sequence-to-sequence cross-entropy loss
Training type: Full fine-tuning of all parameters (no LoRA/PEFT used)
Batching: Dynamic padding with Hugging Face Trainer / PyTorch DataLoader
Evaluation: Monitored on the validation split with metrics (SARI and identical_ratio)
Stopping criteria: Early stopping CallBack based on validation performance

Preprocessing

The dataset was preprocessed by prefixing each source sentence with "Simplify: " and tokenizing both the source (inputs) and target (labels).

Memory & Checkpointing

To reduce VRAM during training, gradient checkpointing was enabled and the KV cache was disabled:

model.config.use_cache = False          # required when using gradient checkpointing
model.gradient_checkpointing_enable()   # saves memory at the cost of extra compute

Notes

Disabling use_cache avoids warnings/conflicts with gradient checkpointing and reduces memory usage in the forward pass.
Gradient checkpointing trades GPU memory ↓ for training speed ↓ (extra recomputation).
For inference/evaluation, re-enable the cache for faster generation:

model.config.use_cache = True

Training Hyperparameters

The models were trained with Hugging Face Seq2SeqTrainingArguments.
Hyperparameters varied slightly across models and runs to optimize, and full logs (batch size, steps, exact LR schedule) were not preserved.
Below are the typical defaults used:

Epochs: 5
Evaluation strategy: every 300 steps
Save strategy: every 300 steps (keep best model, eval_loss as criterion)
Learning rate: ~3e-5
Batch size: ~8-64 , depends on model size
Optimizer: adamw_torch_fused
Precision: bf16
Generation config (during eval): max_length=128, num_beams=4, predict_with_generate=True
Other settings:
- Weight decay: 0.01
- Label smoothing: 0.1
- Warmup ratio: 0.1
- Max grad norm: 0.5
- Dataloader workers: 8 (L4 GPU)

Because hyperparameters were adjusted between runs and not all were logged, exact reproduction may differ slightly.

Evaluation

Testing Data

ASSET (test subset)
MEDEASI (test subset)
OneStopEnglish (advanced → elementary)

Metrics

Identical ratio — share of outputs identical to the source, both normalized by basic, language-agnostic: strip, NFKC, collapse spaces
Identical ratio (ci) — case insensitive identical ratio
SARI — main simplification metric (higher is better)
FKGL — readability grade level (lower is simpler)
BERTScore (F1) — semantic similarity (higher is better)
LENS — composite simplification quality score (higher is better)

Generation Arguments

gen_args = dict(
    max_new_tokens=64,
    num_beams=4,
    length_penalty=1.0,
    no_repeat_ngram_size=3,
    early_stopping=True,
    do_sample=False,
)

Results

Dataset	Identical ratio	Identical ratio (ci)	SARI	FKGL	BERTScore	LENS
ASSET	0.29	0.29	33.80	9.23	87.54	62.46
MEDEASI	0.30	0.30	32.68	10.98	45.14	50.55
OneStopEnglish	0.40	0.40	37.07	8.66	77.77	60.97

Environmental Impact

Hardware Type: Single NVIDIA L4 GPU (Google Colab)
Hours used: Approx. 5–10
Cloud Provider: Google Cloud (via Colab)
Compute Region: Unknown (Google Colab dynamic allocation)
Carbon Emitted: Estimated to be very low (< a few kg CO₂eq), since training was limited to a single GPU for a small number of hours.

Citation

BibTeX:

[More Information Needed]

APA:

[More Information Needed]

Downloads last month: 6

Safetensors

Model size

570M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for eilamc14/pegasus-xsum-text-simplification

Base model

google/pegasus-xsum

Finetuned

(34)

this model

Dataset used to train eilamc14/pegasus-xsum-text-simplification

Evaluation results

SARI on ASSET
test set self-reported

33.800
FKGL on ASSET
test set self-reported

9.230
BERTScore on ASSET
test set self-reported

87.540
LENS on ASSET
test set self-reported

62.460
Identical ratio on ASSET
test set self-reported

0.290
Identical ratio (ci) on ASSET
test set self-reported

0.290
SARI on MEDEASI
test set self-reported

32.680
FKGL on MEDEASI
test set self-reported

10.980
BERTScore on MEDEASI
test set self-reported

45.140
LENS on MEDEASI
test set self-reported

50.550

View on Papers With Code