Model Card for Model ID

This is one of the models fine-tuned on text simplification for Simplify This project.

Model Details

Model Description

Fine-tuned sequence-to-sequence (encoder–decoder) Transformer for English text simplification.
Trained on the dataset eilamc14/wikilarge-clean (cleaned WikiLarge-style pairs).

  • Model type: Seq2Seq Transformer (encoder–decoder)
  • Language (NLP): English
  • License: apache-2.0
  • Finetuned from model: google/pegasus-xsum

Model Sources

Uses

Direct Use

The model is intended for English text simplification.

  • Input format: Simplify: <complex sentence>
  • Output: <simplified sentence>

Typical uses

  • Research on automatic text simplification
  • Benchmarking against other simplification systems
  • Demos/prototypes that require simpler English rewrites

Downstream Use

This repository already contains a fine-tuned model specialized for text simplification.

Further fine-tuning is optional and mainly relevant when:

  • Adapting to a markedly different domain (e.g., medical/legal/news)
  • Addressing specific failure modes (e.g., over/under-simplification, factual drops)
  • Distilling/quantizing for deployment constraints

When fine-tuning further, keep the same input convention: Simplify: <...>.

Out-of-Scope Use

Not intended for:

  • Tasks unrelated to simplification (dialogue, translation etc.)
  • Production use without additional safety filtering (no toxicity/bias mitigation)
  • Languages other than English
  • High-stakes settings (legal/medical advice, safety-critical decisions)

Bias, Risks, and Limitations

The model was trained on Wikipedia and Simple English Wikipedia alignments (via WikiLarge).
As a result, it inherits the characteristics and limitations of this data:

  • Domain bias: Simplifications may reflect encyclopedic style; performance may degrade on informal, technical, or domain-specific text (e.g., medical/legal/news).
  • Content bias: Wikipedia content itself contains biases in coverage, cultural perspective, and phrasing. Simplified outputs may reflect or amplify these.
  • Simplification quality: The model may:
    • Over-simplify (drop important details)
    • Under-simplify (retain complex phrasing)
    • Produce ungrammatical or awkward rephrasings
  • Language limitation: Only suitable for English. Applying to other languages is unsupported.
  • Safety limitation: The model has not been aligned to avoid toxic, biased, or harmful content. If the input text contains such content, the output may reproduce or modify it without safeguards.

Recommendations

  • Evaluation required: Always evaluate the model in the target domain before deployment. Benchmark simplification quality (e.g., with SARI, FKGL, BERTScore, LENS, human evaluation).
  • Human oversight: Use human-in-the-loop review for applications where meaning preservation is critical (education, accessibility tools, etc.).
  • Attribution: Preserve source attribution where required (Wikipedia → CC BY-SA).
  • Not for high-stakes use: Avoid legal, medical, or safety-critical applications without extensive validation and domain adaptation.

How to Get Started with the Model

Load the model and tokenizer directly from the Hugging Face Hub:

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model_id = "eilamc14/bart-base-text-simplification"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)

# Example input
PREFIX = "Simplify: "
text = "The committee deemed the proposal unnecessarily complicated."

# Tokenize and generate
inputs = tokenizer(PREFIX+text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=64, num_beams=4)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

Training Data

WikiLarge-clean Dataset

Training Procedure

  • Hardware: NVIDIA L4 GPU on Google Colab
  • Objective: Standard sequence-to-sequence cross-entropy loss
  • Training type: Full fine-tuning of all parameters (no LoRA/PEFT used)
  • Batching: Dynamic padding with Hugging Face Trainer / PyTorch DataLoader
  • Evaluation: Monitored on the validation split with metrics (SARI and identical_ratio)
  • Stopping criteria: Early stopping CallBack based on validation performance

Preprocessing

The dataset was preprocessed by prefixing each source sentence with "Simplify: " and tokenizing both the source (inputs) and target (labels).

Memory & Checkpointing

To reduce VRAM during training, gradient checkpointing was enabled and the KV cache was disabled:

model.config.use_cache = False          # required when using gradient checkpointing
model.gradient_checkpointing_enable()   # saves memory at the cost of extra compute

Notes

  • Disabling use_cache avoids warnings/conflicts with gradient checkpointing and reduces memory usage in the forward pass.
  • Gradient checkpointing trades GPU memory ↓ for training speed ↓ (extra recomputation).
  • For inference/evaluation, re-enable the cache for faster generation:
model.config.use_cache = True

Training Hyperparameters

The models were trained with Hugging Face Seq2SeqTrainingArguments.
Hyperparameters varied slightly across models and runs to optimize, and full logs (batch size, steps, exact LR schedule) were not preserved.
Below are the typical defaults used:

  • Epochs: 5
  • Evaluation strategy: every 300 steps
  • Save strategy: every 300 steps (keep best model, eval_loss as criterion)
  • Learning rate: ~3e-5
  • Batch size: ~8-64 , depends on model size
  • Optimizer: adamw_torch_fused
  • Precision: bf16
  • Generation config (during eval): max_length=128, num_beams=4, predict_with_generate=True
  • Other settings:
    • Weight decay: 0.01
    • Label smoothing: 0.1
    • Warmup ratio: 0.1
    • Max grad norm: 0.5
    • Dataloader workers: 8 (L4 GPU)

Because hyperparameters were adjusted between runs and not all were logged, exact reproduction may differ slightly.

Evaluation

Testing Data

Metrics

  • Identical ratio — share of outputs identical to the source, both normalized by basic, language-agnostic: strip, NFKC, collapse spaces
  • Identical ratio (ci) — case insensitive identical ratio
  • SARI — main simplification metric (higher is better)
  • FKGL — readability grade level (lower is simpler)
  • BERTScore (F1) — semantic similarity (higher is better)
  • LENS — composite simplification quality score (higher is better)

Generation Arguments

gen_args = dict(
    max_new_tokens=64,
    num_beams=4,
    length_penalty=1.0,
    no_repeat_ngram_size=3,
    early_stopping=True,
    do_sample=False,
)

Results

Dataset Identical ratio Identical ratio (ci) SARI FKGL BERTScore LENS
ASSET 0.29 0.29 33.80 9.23 87.54 62.46
MEDEASI 0.30 0.30 32.68 10.98 45.14 50.55
OneStopEnglish 0.40 0.40 37.07 8.66 77.77 60.97

Environmental Impact

  • Hardware Type: Single NVIDIA L4 GPU (Google Colab)
  • Hours used: Approx. 5–10
  • Cloud Provider: Google Cloud (via Colab)
  • Compute Region: Unknown (Google Colab dynamic allocation)
  • Carbon Emitted: Estimated to be very low (< a few kg CO₂eq), since training was limited to a single GPU for a small number of hours.

Citation

BibTeX:

[More Information Needed]

APA:

[More Information Needed]

Downloads last month
6
Safetensors
Model size
570M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for eilamc14/pegasus-xsum-text-simplification

Finetuned
(34)
this model

Dataset used to train eilamc14/pegasus-xsum-text-simplification

Evaluation results