pszemraj
/

ModernBERT2gpt2-700m-v0.1

Text2Text Generation

encoder-decoder

Inference Endpoints

Model card Files Files and versions Community

ModernBERT2gpt2-700m baseline

EncoderDecoder created from modernBERT-large and random-init gpt2 trained on the pszemraj/t2t-re_pretrain-small dataset for one epoch as a "baseline".

input context length 2048
output context length 512
single tokenizer, slightly modified from modernBERT

Logs and training script can be found on wandb

It achieves the following results on the evaluation set:

Loss: 2.2113
Rouge1: 48.6654
Rouge2: 31.8667
Rougel: 44.9897
Rougelsum: 45.4126
Gen Len: 30.24
Num Input Tokens Seen: 524625736

Downloads last month: 0

Safetensors

Model size

702M params

Tensor type

F32

·

Inference Providers NEW

Text2Text Generation

This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for pszemraj/ModernBERT2gpt2-700m-v0.1

Base model

answerdotai/ModernBERT-large

Finetuned

(48)

this model

Dataset used to train pszemraj/ModernBERT2gpt2-700m-v0.1