gec-flan-t5-large-stage-2-v3

This model is a fine-tuned version of 512duncanl/gec-flan-t5-xxl-stage-1 on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.1747
F0.5: 0.6727
Precision: 0.6968
Recall: 0.5908

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Trained on 4x H100 SXM 80G

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-06
train_batch_size: 18
eval_batch_size: 18
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 1
total_train_batch_size: 72
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.1
num_epochs: 2
weight_decay: 0.005

Training results

Training Loss	Epoch	Step	Validation Loss	F0.5	Precision	Recall
0.2030	1.0	514	0.1900	0.6723	0.7071	0.5619
0.1874	2.0	1028	0.1747	0.6727	0.6968	0.5908

Framework versions

Transformers 4.54.0
Pytorch 2.7.1+cu128
Datasets 4.0.0
Tokenizers 0.21.2

512duncanl
/

gec-flan-t5-xxl-stage-2

gec-flan-t5-large-stage-2-v3

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for 512duncanl/gec-flan-t5-xxl-stage-2

Evaluation results