DistilBERT Fine-Tuned on IMDB for Masked Language Modeling

Model Description

This model is a fine-tuned version of distilbert-base-uncased for the masked language modeling task. It has been trained on the IMDb dataset.

Model Training Details

Training Dataset

  • Dataset: IMDB dataset from Hugging Face
  • Dataset Split:
    • Train: 25,000 samples
    • Test: 25,000 samples
    • Unsupervised: 50,000 samples
  • Training and Unsupervised Data Concatenation: Training performed on a combined dataset of train and unsupervised splits.

Training Arguments

The following parameters were used during fine-tuning:

  • Number of Training Epochs: 10
  • Overwrite Output Directory: True
  • Evaluation Strategy: steps
    • Evaluation Steps: 500
  • Checkpoint Save Strategy: steps
    • Save Steps: 500
  • Load Best Model at End: True
  • Metric for Best Model: eval_loss
    • Direction: Lower eval_loss is better (greater_is_better = False).
  • Learning Rate: 2e-5
  • Weight Decay: 0.01
  • Per-Device Batch Size (Training): 32
  • Per-Device Batch Size (Evaluation): 32
  • Warmup Steps: 1,000
  • Mixed Precision Training: Enabled (fp16 = True)
  • Logging Steps: 100
  • Gradient Accumulation Steps: 2

Early Stopping

  • The model was configured with early stopping to prevent overfitting.
  • Training stopped after 5.87 epochs (21,000 steps), as there was no significant improvement in eval_loss.

Evaluation Results

  • Metric Used: eval_loss
  • Final Perplexity: 8.34
  • Best Checkpoint: Model saved at the end of early stopping (step 21,000).

Model Usage

The model can be used for masked language modeling tasks using the fill-mask pipeline from Hugging Face. Example:

from transformers import pipeline

mask_filler = pipeline("fill-mask", model="Prikshit7766/distilbert-finetuned-imdb-mlm")

text = "This is a great [MASK]."
predictions = mask_filler(text)

for pred in predictions:
    print(f">>> {pred['sequence']}")

Output Example:

>>> This is a great movie.
>>> This is a great film.
>>> This is a great show.
>>> This is a great documentary.
>>> This is a great story.
Downloads last month
132
Safetensors
Model size
67M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for Prikshit7766/distilbert-finetuned-imdb-mlm

Finetuned
(7316)
this model

Dataset used to train Prikshit7766/distilbert-finetuned-imdb-mlm