Al-Atlas-LLM-0.5B-bs-4-lr-5e-05-ep-3-wp-0.1-gacc-32-gnm-1.0-FP16-mx-2048-v2.3

This model is a fine-tuned version of BounharAbdelaziz/Al-Atlas-LLM-0.5B on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.4813
Bleu: 11.5004
Chrf: 35.6364
Ter: 104.5543
Gen Len: 1.0

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 32
total_train_batch_size: 128
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Bleu	Chrf	Ter	Gen Len
2.3349	0.1156	20	0.8954	6.5229	25.1844	97.7052	1.0
1.8212	0.2313	40	0.5205	8.3863	30.7922	100.3183	1.0
1.6088	0.3469	60	0.4910	9.9375	32.6773	103.6373	1.0
1.4974	0.4626	80	0.4865	10.3046	33.5592	102.9448	1.0
1.4494	0.5782	100	0.4847	10.953	34.2959	102.3899	1.0
1.3977	0.6939	120	0.4831	10.9053	34.2735	104.1302	1.0
1.3694	0.8095	140	0.4793	11.0026	34.7101	105.4975	1.0
1.3454	0.9252	160	0.4760	11.2241	34.5451	104.612	1.0
1.2959	1.0463	180	0.4830	11.133	35.0556	105.4385	1.0
1.2836	1.1619	200	0.4860	11.3134	34.6022	104.1697	1.0
1.2706	1.2776	220	0.4835	10.936	34.8636	105.827	1.0
1.2087	1.3932	240	0.4832	11.114	34.9581	106.7259	1.0
1.1982	1.5089	260	0.4760	11.1626	35.0099	105.5238	1.0
1.2821	1.6245	280	0.4822	11.1749	34.9248	106.0043	1.0
1.2519	1.7402	300	0.4811	11.4891	35.3655	105.6169	1.0
1.2769	1.8558	320	0.4776	11.4816	35.2067	104.5519	1.0
1.2149	1.9714	340	0.4771	11.5422	35.434	103.5539	1.0
1.2203	2.0925	360	0.4828	11.5389	35.4972	104.2589	1.0
1.1668	2.2082	380	0.4811	11.5922	35.5947	103.9682	1.0
1.1519	2.3238	400	0.4807	11.4581	35.5341	104.5049	1.0
1.1886	2.4395	420	0.4820	11.5028	35.4251	104.3936	1.0
1.1762	2.5551	440	0.4839	11.5828	35.5995	104.6084	1.0
1.1789	2.6708	460	0.4818	11.5674	35.5663	104.4349	1.0
1.1594	2.7864	480	0.4811	11.534	35.6868	104.1138	1.0
1.2573	2.9021	500	0.4813	11.5004	35.6364	104.5543	1.0

Framework versions

Transformers 4.49.0
Pytorch 2.6.0+cu124
Datasets 2.21.0
Tokenizers 0.21.0

BounharAbdelaziz
/

Al-Atlas-LLM-0.5B-bs-4-lr-5e-05-ep-3-wp-0.1-gacc-32-gnm-1.0-FP16-mx-2048-v2.3

Al-Atlas-LLM-0.5B-bs-4-lr-5e-05-ep-3-wp-0.1-gacc-32-gnm-1.0-FP16-mx-2048-v2.3

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for BounharAbdelaziz/Al-Atlas-LLM-0.5B-bs-4-lr-5e-05-ep-3-wp-0.1-gacc-32-gnm-1.0-FP16-mx-2048-v2.3

Evaluation results