amdchess-v5 / README.md

End of training

bb64a1f verified 12 months ago

5.69 kB

	---
	library_name: transformers
	license: apache-2.0
	base_model: amd/AMD-Llama-135m
	tags:
	- generated_from_trainer
	model-index:
	- name: amdchess-v5
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# amdchess-v5

	This model is a fine-tuned version of [amd/AMD-Llama-135m](https://huggingface.co/amd/AMD-Llama-135m) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.7610

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 3e-05
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Use grokadamw with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: cosine
	- num_epochs: 0.25

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 2.9045 \| 0.0030 \| 5 \| 2.7322 \|
	\| 1.5833 \| 0.0059 \| 10 \| 1.7005 \|
	\| 1.5115 \| 0.0089 \| 15 \| 1.3183 \|
	\| 1.0591 \| 0.0118 \| 20 \| 1.3213 \|
	\| 1.1079 \| 0.0148 \| 25 \| 1.1174 \|
	\| 1.1004 \| 0.0177 \| 30 \| 1.1248 \|
	\| 1.0783 \| 0.0207 \| 35 \| 1.0751 \|
	\| 1.0209 \| 0.0236 \| 40 \| 1.0297 \|
	\| 1.0955 \| 0.0266 \| 45 \| 1.0330 \|
	\| 1.1106 \| 0.0295 \| 50 \| 1.0172 \|
	\| 1.0855 \| 0.0325 \| 55 \| 0.9780 \|
	\| 0.979 \| 0.0354 \| 60 \| 0.9635 \|
	\| 0.8885 \| 0.0384 \| 65 \| 0.9590 \|
	\| 0.9195 \| 0.0413 \| 70 \| 0.9452 \|
	\| 0.9518 \| 0.0443 \| 75 \| 0.9325 \|
	\| 0.9609 \| 0.0472 \| 80 \| 0.9332 \|
	\| 0.9327 \| 0.0502 \| 85 \| 0.9229 \|
	\| 0.9621 \| 0.0531 \| 90 \| 0.9157 \|
	\| 0.9956 \| 0.0561 \| 95 \| 0.9094 \|
	\| 0.8193 \| 0.0590 \| 100 \| 0.8958 \|
	\| 0.9361 \| 0.0620 \| 105 \| 0.8915 \|
	\| 0.9039 \| 0.0649 \| 110 \| 0.8882 \|
	\| 0.8757 \| 0.0679 \| 115 \| 0.8813 \|
	\| 0.8875 \| 0.0708 \| 120 \| 0.8776 \|
	\| 0.8989 \| 0.0738 \| 125 \| 0.8805 \|
	\| 0.9478 \| 0.0767 \| 130 \| 0.8706 \|
	\| 0.9132 \| 0.0797 \| 135 \| 0.8645 \|
	\| 0.8755 \| 0.0826 \| 140 \| 0.8607 \|
	\| 0.9304 \| 0.0856 \| 145 \| 0.8559 \|
	\| 0.8711 \| 0.0885 \| 150 \| 0.8466 \|
	\| 0.8511 \| 0.0915 \| 155 \| 0.8480 \|
	\| 0.8768 \| 0.0945 \| 160 \| 0.8410 \|
	\| 0.6914 \| 0.0974 \| 165 \| 0.8407 \|
	\| 0.8625 \| 0.1004 \| 170 \| 0.8342 \|
	\| 0.8219 \| 0.1033 \| 175 \| 0.8370 \|
	\| 0.9106 \| 0.1063 \| 180 \| 0.8296 \|
	\| 0.8512 \| 0.1092 \| 185 \| 0.8253 \|
	\| 0.8286 \| 0.1122 \| 190 \| 0.8251 \|
	\| 0.9075 \| 0.1151 \| 195 \| 0.8214 \|
	\| 0.8733 \| 0.1181 \| 200 \| 0.8199 \|
	\| 0.7881 \| 0.1210 \| 205 \| 0.8164 \|
	\| 0.9131 \| 0.1240 \| 210 \| 0.8150 \|
	\| 0.8421 \| 0.1269 \| 215 \| 0.8104 \|
	\| 0.8589 \| 0.1299 \| 220 \| 0.8083 \|
	\| 0.7674 \| 0.1328 \| 225 \| 0.8065 \|
	\| 0.8566 \| 0.1358 \| 230 \| 0.8065 \|
	\| 0.8657 \| 0.1387 \| 235 \| 0.8019 \|
	\| 0.7534 \| 0.1417 \| 240 \| 0.7992 \|
	\| 0.7988 \| 0.1446 \| 245 \| 0.7970 \|
	\| 0.8197 \| 0.1476 \| 250 \| 0.7937 \|
	\| 0.8175 \| 0.1505 \| 255 \| 0.7931 \|
	\| 0.8831 \| 0.1535 \| 260 \| 0.7915 \|
	\| 0.8714 \| 0.1564 \| 265 \| 0.7882 \|
	\| 0.8097 \| 0.1594 \| 270 \| 0.7864 \|
	\| 0.7864 \| 0.1623 \| 275 \| 0.7849 \|
	\| 0.7521 \| 0.1653 \| 280 \| 0.7845 \|
	\| 0.8208 \| 0.1682 \| 285 \| 0.7820 \|
	\| 0.7658 \| 0.1712 \| 290 \| 0.7802 \|
	\| 0.8623 \| 0.1741 \| 295 \| 0.7782 \|
	\| 0.8526 \| 0.1771 \| 300 \| 0.7765 \|
	\| 0.8304 \| 0.1800 \| 305 \| 0.7749 \|
	\| 0.823 \| 0.1830 \| 310 \| 0.7737 \|
	\| 0.762 \| 0.1860 \| 315 \| 0.7726 \|
	\| 0.7545 \| 0.1889 \| 320 \| 0.7715 \|
	\| 0.7818 \| 0.1919 \| 325 \| 0.7699 \|
	\| 0.7601 \| 0.1948 \| 330 \| 0.7699 \|
	\| 0.7414 \| 0.1978 \| 335 \| 0.7689 \|
	\| 0.8397 \| 0.2007 \| 340 \| 0.7682 \|
	\| 0.8282 \| 0.2037 \| 345 \| 0.7668 \|
	\| 0.7676 \| 0.2066 \| 350 \| 0.7655 \|
	\| 0.7768 \| 0.2096 \| 355 \| 0.7644 \|
	\| 0.7249 \| 0.2125 \| 360 \| 0.7639 \|
	\| 0.7633 \| 0.2155 \| 365 \| 0.7635 \|
	\| 0.721 \| 0.2184 \| 370 \| 0.7632 \|
	\| 0.798 \| 0.2214 \| 375 \| 0.7624 \|
	\| 0.7601 \| 0.2243 \| 380 \| 0.7620 \|
	\| 0.8439 \| 0.2273 \| 385 \| 0.7618 \|
	\| 0.777 \| 0.2302 \| 390 \| 0.7616 \|
	\| 0.6739 \| 0.2332 \| 395 \| 0.7614 \|
	\| 0.802 \| 0.2361 \| 400 \| 0.7612 \|
	\| 0.7868 \| 0.2391 \| 405 \| 0.7611 \|
	\| 0.6621 \| 0.2420 \| 410 \| 0.7610 \|
	\| 0.7723 \| 0.2450 \| 415 \| 0.7610 \|
	\| 0.8052 \| 0.2479 \| 420 \| 0.7610 \|


	### Framework versions

	- Transformers 4.46.0
	- Pytorch 2.5.0+cu121
	- Datasets 3.0.2
	- Tokenizers 0.20.1

	---
	library_name: transformers
	license: apache-2.0
	base_model: amd/AMD-Llama-135m
	tags:
	- generated_from_trainer
	model-index:
	- name: amdchess-v5
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# amdchess-v5

	This model is a fine-tuned version of [amd/AMD-Llama-135m](https://huggingface.co/amd/AMD-Llama-135m) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.7610

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 3e-05
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Use grokadamw with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: cosine
	- num_epochs: 0.25

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 2.9045 \| 0.0030 \| 5 \| 2.7322 \|
	\| 1.5833 \| 0.0059 \| 10 \| 1.7005 \|
	\| 1.5115 \| 0.0089 \| 15 \| 1.3183 \|
	\| 1.0591 \| 0.0118 \| 20 \| 1.3213 \|
	\| 1.1079 \| 0.0148 \| 25 \| 1.1174 \|
	\| 1.1004 \| 0.0177 \| 30 \| 1.1248 \|
	\| 1.0783 \| 0.0207 \| 35 \| 1.0751 \|
	\| 1.0209 \| 0.0236 \| 40 \| 1.0297 \|
	\| 1.0955 \| 0.0266 \| 45 \| 1.0330 \|
	\| 1.1106 \| 0.0295 \| 50 \| 1.0172 \|
	\| 1.0855 \| 0.0325 \| 55 \| 0.9780 \|
	\| 0.979 \| 0.0354 \| 60 \| 0.9635 \|
	\| 0.8885 \| 0.0384 \| 65 \| 0.9590 \|
	\| 0.9195 \| 0.0413 \| 70 \| 0.9452 \|
	\| 0.9518 \| 0.0443 \| 75 \| 0.9325 \|
	\| 0.9609 \| 0.0472 \| 80 \| 0.9332 \|
	\| 0.9327 \| 0.0502 \| 85 \| 0.9229 \|
	\| 0.9621 \| 0.0531 \| 90 \| 0.9157 \|
	\| 0.9956 \| 0.0561 \| 95 \| 0.9094 \|
	\| 0.8193 \| 0.0590 \| 100 \| 0.8958 \|
	\| 0.9361 \| 0.0620 \| 105 \| 0.8915 \|
	\| 0.9039 \| 0.0649 \| 110 \| 0.8882 \|
	\| 0.8757 \| 0.0679 \| 115 \| 0.8813 \|
	\| 0.8875 \| 0.0708 \| 120 \| 0.8776 \|
	\| 0.8989 \| 0.0738 \| 125 \| 0.8805 \|
	\| 0.9478 \| 0.0767 \| 130 \| 0.8706 \|
	\| 0.9132 \| 0.0797 \| 135 \| 0.8645 \|
	\| 0.8755 \| 0.0826 \| 140 \| 0.8607 \|
	\| 0.9304 \| 0.0856 \| 145 \| 0.8559 \|
	\| 0.8711 \| 0.0885 \| 150 \| 0.8466 \|
	\| 0.8511 \| 0.0915 \| 155 \| 0.8480 \|
	\| 0.8768 \| 0.0945 \| 160 \| 0.8410 \|
	\| 0.6914 \| 0.0974 \| 165 \| 0.8407 \|
	\| 0.8625 \| 0.1004 \| 170 \| 0.8342 \|
	\| 0.8219 \| 0.1033 \| 175 \| 0.8370 \|
	\| 0.9106 \| 0.1063 \| 180 \| 0.8296 \|
	\| 0.8512 \| 0.1092 \| 185 \| 0.8253 \|
	\| 0.8286 \| 0.1122 \| 190 \| 0.8251 \|
	\| 0.9075 \| 0.1151 \| 195 \| 0.8214 \|
	\| 0.8733 \| 0.1181 \| 200 \| 0.8199 \|
	\| 0.7881 \| 0.1210 \| 205 \| 0.8164 \|
	\| 0.9131 \| 0.1240 \| 210 \| 0.8150 \|
	\| 0.8421 \| 0.1269 \| 215 \| 0.8104 \|
	\| 0.8589 \| 0.1299 \| 220 \| 0.8083 \|
	\| 0.7674 \| 0.1328 \| 225 \| 0.8065 \|
	\| 0.8566 \| 0.1358 \| 230 \| 0.8065 \|
	\| 0.8657 \| 0.1387 \| 235 \| 0.8019 \|
	\| 0.7534 \| 0.1417 \| 240 \| 0.7992 \|
	\| 0.7988 \| 0.1446 \| 245 \| 0.7970 \|
	\| 0.8197 \| 0.1476 \| 250 \| 0.7937 \|
	\| 0.8175 \| 0.1505 \| 255 \| 0.7931 \|
	\| 0.8831 \| 0.1535 \| 260 \| 0.7915 \|
	\| 0.8714 \| 0.1564 \| 265 \| 0.7882 \|
	\| 0.8097 \| 0.1594 \| 270 \| 0.7864 \|
	\| 0.7864 \| 0.1623 \| 275 \| 0.7849 \|
	\| 0.7521 \| 0.1653 \| 280 \| 0.7845 \|
	\| 0.8208 \| 0.1682 \| 285 \| 0.7820 \|
	\| 0.7658 \| 0.1712 \| 290 \| 0.7802 \|
	\| 0.8623 \| 0.1741 \| 295 \| 0.7782 \|
	\| 0.8526 \| 0.1771 \| 300 \| 0.7765 \|
	\| 0.8304 \| 0.1800 \| 305 \| 0.7749 \|
	\| 0.823 \| 0.1830 \| 310 \| 0.7737 \|
	\| 0.762 \| 0.1860 \| 315 \| 0.7726 \|
	\| 0.7545 \| 0.1889 \| 320 \| 0.7715 \|
	\| 0.7818 \| 0.1919 \| 325 \| 0.7699 \|
	\| 0.7601 \| 0.1948 \| 330 \| 0.7699 \|
	\| 0.7414 \| 0.1978 \| 335 \| 0.7689 \|
	\| 0.8397 \| 0.2007 \| 340 \| 0.7682 \|
	\| 0.8282 \| 0.2037 \| 345 \| 0.7668 \|
	\| 0.7676 \| 0.2066 \| 350 \| 0.7655 \|
	\| 0.7768 \| 0.2096 \| 355 \| 0.7644 \|
	\| 0.7249 \| 0.2125 \| 360 \| 0.7639 \|
	\| 0.7633 \| 0.2155 \| 365 \| 0.7635 \|
	\| 0.721 \| 0.2184 \| 370 \| 0.7632 \|
	\| 0.798 \| 0.2214 \| 375 \| 0.7624 \|
	\| 0.7601 \| 0.2243 \| 380 \| 0.7620 \|
	\| 0.8439 \| 0.2273 \| 385 \| 0.7618 \|
	\| 0.777 \| 0.2302 \| 390 \| 0.7616 \|
	\| 0.6739 \| 0.2332 \| 395 \| 0.7614 \|
	\| 0.802 \| 0.2361 \| 400 \| 0.7612 \|
	\| 0.7868 \| 0.2391 \| 405 \| 0.7611 \|
	\| 0.6621 \| 0.2420 \| 410 \| 0.7610 \|
	\| 0.7723 \| 0.2450 \| 415 \| 0.7610 \|
	\| 0.8052 \| 0.2479 \| 420 \| 0.7610 \|


	### Framework versions

	- Transformers 4.46.0
	- Pytorch 2.5.0+cu121
	- Datasets 3.0.2
	- Tokenizers 0.20.1