End of training

6ae0010 over 1 year ago

4.22 kB

	---
	license: apache-2.0
	base_model: google/flan-t5-base
	tags:
	- generated_from_trainer
	datasets:
	- samsum
	metrics:
	- rouge
	model-index:
	- name: flan-t5-base-samsum
	results:
	- task:
	name: Sequence-to-sequence Language Modeling
	type: text2text-generation
	dataset:
	name: samsum
	type: samsum
	config: samsum
	split: test
	args: samsum
	metrics:
	- name: Rouge1
	type: rouge
	value: 47.08
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# flan-t5-base-samsum

	This model is a fine-tuned version of [google/flan-t5-base](https://huggingface.co/google/flan-t5-base) on the samsum dataset.
	It achieves the following results on the evaluation set:
	- Loss: 1.3859
	- Rouge1: 47.08
	- Rouge2: 23.2603
	- Rougel: 39.2645
	- Rougelsum: 43.2898
	- Gen Len: 17.3333

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 24
	- eval_batch_size: 24
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 2

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rouge1 \| Rouge2 \| Rougel \| Rougelsum \| Gen Len \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:-------:\|:-------:\|:-------:\|:---------:\|:-------:\|
	\| 1.5121 \| 0.08 \| 50 \| 1.4287 \| 46.7443 \| 22.8826 \| 38.9466 \| 42.862 \| 16.9634 \|
	\| 1.46 \| 0.16 \| 100 \| 1.4199 \| 46.7723 \| 22.8011 \| 39.0224 \| 42.9095 \| 17.2393 \|
	\| 1.4515 \| 0.24 \| 150 \| 1.4147 \| 46.6593 \| 23.027 \| 38.9378 \| 42.8492 \| 17.1245 \|
	\| 1.4679 \| 0.33 \| 200 \| 1.4121 \| 46.8312 \| 22.8345 \| 39.1545 \| 43.2035 \| 17.3431 \|
	\| 1.451 \| 0.41 \| 250 \| 1.4109 \| 46.826 \| 23.038 \| 39.2744 \| 43.3106 \| 17.2686 \|
	\| 1.4434 \| 0.49 \| 300 \| 1.4040 \| 46.6744 \| 23.0221 \| 39.3167 \| 43.1835 \| 16.9158 \|
	\| 1.4417 \| 0.57 \| 350 \| 1.4007 \| 46.851 \| 23.0448 \| 39.2346 \| 43.2396 \| 17.1172 \|
	\| 1.4781 \| 0.65 \| 400 \| 1.3952 \| 46.7831 \| 23.1146 \| 39.295 \| 43.2256 \| 17.2076 \|
	\| 1.4626 \| 0.73 \| 450 \| 1.3940 \| 47.0933 \| 23.2741 \| 39.2954 \| 43.3102 \| 17.2222 \|
	\| 1.4307 \| 0.81 \| 500 \| 1.3955 \| 46.8827 \| 23.2016 \| 39.2817 \| 43.2379 \| 17.2002 \|
	\| 1.4586 \| 0.9 \| 550 \| 1.3933 \| 46.7152 \| 23.1439 \| 39.2576 \| 43.1754 \| 17.3040 \|
	\| 1.4465 \| 0.98 \| 600 \| 1.3905 \| 46.8332 \| 23.3356 \| 39.2596 \| 43.2472 \| 17.3468 \|
	\| 1.381 \| 1.06 \| 650 \| 1.3953 \| 46.9289 \| 22.9605 \| 39.0651 \| 43.2085 \| 17.4066 \|
	\| 1.4125 \| 1.14 \| 700 \| 1.3922 \| 46.4822 \| 23.0893 \| 38.9024 \| 42.9789 \| 17.2381 \|
	\| 1.3667 \| 1.22 \| 750 \| 1.3922 \| 47.2977 \| 23.4064 \| 39.5091 \| 43.5742 \| 17.2930 \|
	\| 1.3878 \| 1.3 \| 800 \| 1.3953 \| 46.6405 \| 23.2132 \| 39.2853 \| 43.3049 \| 17.3358 \|
	\| 1.3884 \| 1.38 \| 850 \| 1.3931 \| 46.9152 \| 23.1594 \| 39.1629 \| 43.2254 \| 17.3614 \|
	\| 1.3766 \| 1.47 \| 900 \| 1.3898 \| 46.988 \| 23.1708 \| 39.2446 \| 43.311 \| 17.3333 \|
	\| 1.3727 \| 1.55 \| 950 \| 1.3889 \| 46.6771 \| 23.0915 \| 39.0787 \| 43.0184 \| 17.3211 \|
	\| 1.4001 \| 1.63 \| 1000 \| 1.3859 \| 47.08 \| 23.2603 \| 39.2645 \| 43.2898 \| 17.3333 \|
	\| 1.3894 \| 1.71 \| 1050 \| 1.3874 \| 47.2134 \| 23.3696 \| 39.4356 \| 43.5422 \| 17.3297 \|
	\| 1.3697 \| 1.79 \| 1100 \| 1.3860 \| 47.06 \| 23.3769 \| 39.3494 \| 43.4113 \| 17.3504 \|
	\| 1.3886 \| 1.87 \| 1150 \| 1.3862 \| 47.0159 \| 23.3728 \| 39.3871 \| 43.4016 \| 17.3260 \|
	\| 1.4037 \| 1.95 \| 1200 \| 1.3861 \| 47.0039 \| 23.4055 \| 39.3356 \| 43.3787 \| 17.3321 \|


	### Framework versions

	- Transformers 4.33.2
	- Pytorch 2.0.0+cu117
	- Datasets 2.14.5
	- Tokenizers 0.13.3