ModernBERT2gpt2-700m baseline
EncoderDecoder created from modernBERT-large and random-init gpt2
trained on the pszemraj/t2t-re_pretrain-small dataset for one epoch as a "baseline".
- input context length 2048
- output context length 512
- single tokenizer, slightly modified from modernBERT
Logs and training script can be found on wandb
It achieves the following results on the evaluation set:
- Loss: 2.2113
- Rouge1: 48.6654
- Rouge2: 31.8667
- Rougel: 44.9897
- Rougelsum: 45.4126
- Gen Len: 30.24
- Num Input Tokens Seen: 524625736
- Downloads last month
- 0
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Model tree for pszemraj/ModernBERT2gpt2-700m-v0.1
Base model
answerdotai/ModernBERT-large