ModernBERT2gpt2-700m baseline

EncoderDecoder created from modernBERT-large and random-init gpt2 trained on the pszemraj/t2t-re_pretrain-small dataset for one epoch as a "baseline".

  • input context length 2048
  • output context length 512
  • single tokenizer, slightly modified from modernBERT

Logs and training script can be found on wandb


It achieves the following results on the evaluation set:

  • Loss: 2.2113
  • Rouge1: 48.6654
  • Rouge2: 31.8667
  • Rougel: 44.9897
  • Rougelsum: 45.4126
  • Gen Len: 30.24
  • Num Input Tokens Seen: 524625736
Downloads last month
0
Safetensors
Model size
702M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for pszemraj/ModernBERT2gpt2-700m-v0.1

Finetuned
(48)
this model

Dataset used to train pszemraj/ModernBERT2gpt2-700m-v0.1