Model Card for Model ID
This is one of the models fine-tuned on text simplification for Simplify This project.
Model Details
Model Description
Fine-tuned sequence-to-sequence (encoder–decoder) Transformer for English text simplification.
Trained on the dataset eilamc14/wikilarge-clean
(cleaned WikiLarge-style pairs).
- Model type: Seq2Seq Transformer (encoder–decoder)
- Language (NLP): English
- License:
apache-2.0
- Finetuned from model:
google/pegasus-xsum
Model Sources
- Repository (code): https://github.com/eilamc14/Simplify-This
- Dataset: https://huggingface.co/datasets/eilamc14/wikilarge-clean
- Paper [optional]: —
- Demo [optional]: —
Uses
Direct Use
The model is intended for English text simplification.
- Input format:
Simplify: <complex sentence>
- Output:
<simplified sentence>
Typical uses
- Research on automatic text simplification
- Benchmarking against other simplification systems
- Demos/prototypes that require simpler English rewrites
Downstream Use
This repository already contains a fine-tuned model specialized for text simplification.
Further fine-tuning is optional and mainly relevant when:
- Adapting to a markedly different domain (e.g., medical/legal/news)
- Addressing specific failure modes (e.g., over/under-simplification, factual drops)
- Distilling/quantizing for deployment constraints
When fine-tuning further, keep the same input convention: Simplify: <...>
.
Out-of-Scope Use
Not intended for:
- Tasks unrelated to simplification (dialogue, translation etc.)
- Production use without additional safety filtering (no toxicity/bias mitigation)
- Languages other than English
- High-stakes settings (legal/medical advice, safety-critical decisions)
Bias, Risks, and Limitations
The model was trained on Wikipedia and Simple English Wikipedia alignments (via WikiLarge).
As a result, it inherits the characteristics and limitations of this data:
- Domain bias: Simplifications may reflect encyclopedic style; performance may degrade on informal, technical, or domain-specific text (e.g., medical/legal/news).
- Content bias: Wikipedia content itself contains biases in coverage, cultural perspective, and phrasing. Simplified outputs may reflect or amplify these.
- Simplification quality: The model may:
- Over-simplify (drop important details)
- Under-simplify (retain complex phrasing)
- Produce ungrammatical or awkward rephrasings
- Language limitation: Only suitable for English. Applying to other languages is unsupported.
- Safety limitation: The model has not been aligned to avoid toxic, biased, or harmful content. If the input text contains such content, the output may reproduce or modify it without safeguards.
Recommendations
- Evaluation required: Always evaluate the model in the target domain before deployment. Benchmark simplification quality (e.g., with SARI, FKGL, BERTScore, LENS, human evaluation).
- Human oversight: Use human-in-the-loop review for applications where meaning preservation is critical (education, accessibility tools, etc.).
- Attribution: Preserve source attribution where required (Wikipedia → CC BY-SA).
- Not for high-stakes use: Avoid legal, medical, or safety-critical applications without extensive validation and domain adaptation.
How to Get Started with the Model
Load the model and tokenizer directly from the Hugging Face Hub:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
model_id = "eilamc14/bart-base-text-simplification"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
# Example input
PREFIX = "Simplify: "
text = "The committee deemed the proposal unnecessarily complicated."
# Tokenize and generate
inputs = tokenizer(PREFIX+text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=64, num_beams=4)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Training Details
Training Data
WikiLarge-clean Dataset
Training Procedure
- Hardware: NVIDIA L4 GPU on Google Colab
- Objective: Standard sequence-to-sequence cross-entropy loss
- Training type: Full fine-tuning of all parameters (no LoRA/PEFT used)
- Batching: Dynamic padding with Hugging Face
Trainer
/ PyTorch DataLoader - Evaluation: Monitored on the
validation
split with metrics (SARI and identical_ratio) - Stopping criteria: Early stopping CallBack based on validation performance
Preprocessing
The dataset was preprocessed by prefixing each source sentence with "Simplify: " and tokenizing both the source (inputs) and target (labels).
Memory & Checkpointing
To reduce VRAM during training, gradient checkpointing was enabled and the KV cache was disabled:
model.config.use_cache = False # required when using gradient checkpointing
model.gradient_checkpointing_enable() # saves memory at the cost of extra compute
Notes
- Disabling
use_cache
avoids warnings/conflicts with gradient checkpointing and reduces memory usage in the forward pass. - Gradient checkpointing trades GPU memory ↓ for training speed ↓ (extra recomputation).
- For inference/evaluation, re-enable the cache for faster generation:
model.config.use_cache = True
Training Hyperparameters
The models were trained with Hugging Face Seq2SeqTrainingArguments
.
Hyperparameters varied slightly across models and runs to optimize, and full logs (batch size, steps, exact LR schedule) were not preserved.
Below are the typical defaults used:
- Epochs: 5
- Evaluation strategy: every 300 steps
- Save strategy: every 300 steps (keep best model,
eval_loss
as criterion) - Learning rate: ~3e-5
- Batch size: ~8-64 , depends on model size
- Optimizer:
adamw_torch_fused
- Precision: bf16
- Generation config (during eval):
max_length=128
,num_beams=4
,predict_with_generate=True
- Other settings:
- Weight decay: 0.01
- Label smoothing: 0.1
- Warmup ratio: 0.1
- Max grad norm: 0.5
- Dataloader workers: 8 (L4 GPU)
Because hyperparameters were adjusted between runs and not all were logged, exact reproduction may differ slightly.
Evaluation
Testing Data
- ASSET (test subset)
- MEDEASI (test subset)
- OneStopEnglish (advanced → elementary)
Metrics
- Identical ratio — share of outputs identical to the source, both normalized by basic, language-agnostic: strip, NFKC, collapse spaces
- Identical ratio (ci) — case insensitive identical ratio
- SARI — main simplification metric (higher is better)
- FKGL — readability grade level (lower is simpler)
- BERTScore (F1) — semantic similarity (higher is better)
- LENS — composite simplification quality score (higher is better)
Generation Arguments
gen_args = dict(
max_new_tokens=64,
num_beams=4,
length_penalty=1.0,
no_repeat_ngram_size=3,
early_stopping=True,
do_sample=False,
)
Results
Dataset | Identical ratio | Identical ratio (ci) | SARI | FKGL | BERTScore | LENS |
---|---|---|---|---|---|---|
ASSET | 0.29 | 0.29 | 33.80 | 9.23 | 87.54 | 62.46 |
MEDEASI | 0.30 | 0.30 | 32.68 | 10.98 | 45.14 | 50.55 |
OneStopEnglish | 0.40 | 0.40 | 37.07 | 8.66 | 77.77 | 60.97 |
Environmental Impact
- Hardware Type: Single NVIDIA L4 GPU (Google Colab)
- Hours used: Approx. 5–10
- Cloud Provider: Google Cloud (via Colab)
- Compute Region: Unknown (Google Colab dynamic allocation)
- Carbon Emitted: Estimated to be very low (< a few kg CO₂eq), since training was limited to a single GPU for a small number of hours.
Citation
BibTeX:
[More Information Needed]
APA:
[More Information Needed]
- Downloads last month
- 6
Model tree for eilamc14/pegasus-xsum-text-simplification
Base model
google/pegasus-xsumDataset used to train eilamc14/pegasus-xsum-text-simplification
Evaluation results
- SARI on ASSETtest set self-reported33.800
- FKGL on ASSETtest set self-reported9.230
- BERTScore on ASSETtest set self-reported87.540
- LENS on ASSETtest set self-reported62.460
- Identical ratio on ASSETtest set self-reported0.290
- Identical ratio (ci) on ASSETtest set self-reported0.290
- SARI on MEDEASItest set self-reported32.680
- FKGL on MEDEASItest set self-reported10.980
- BERTScore on MEDEASItest set self-reported45.140
- LENS on MEDEASItest set self-reported50.550