--- library_name: transformers license: mit base_model: microsoft/Phi-4-multimodal-instruct tags: - generated_from_trainer model-index: - name: Phi4-5.6B-transformers-ex1 results: [] --- <!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. --> # Phi4-5.6B-transformers-ex1 This model is a fine-tuned version of [microsoft/Phi-4-multimodal-instruct](https://huggingface.co/microsoft/Phi-4-multimodal-instruct) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 0.4529 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0002 - train_batch_size: 1 - eval_batch_size: 1 - seed: 42 - gradient_accumulation_steps: 4 - total_train_batch_size: 4 - optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.95) and epsilon=1e-07 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 50 - num_epochs: 10 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:----:|:---------------:| | 0.1653 | 0.0799 | 20 | 0.1542 | | 0.1324 | 0.1598 | 40 | 0.1429 | | 0.2598 | 0.2398 | 60 | 0.3326 | | 0.1638 | 0.3197 | 80 | 0.1500 | | 0.1499 | 0.3996 | 100 | 0.4031 | | 0.15 | 0.4795 | 120 | 0.3213 | | 0.1679 | 0.5594 | 140 | 0.1489 | | 0.1431 | 0.6394 | 160 | 0.1531 | | 0.1462 | 0.7193 | 180 | 0.1488 | | 0.1464 | 0.7992 | 200 | 0.1485 | | 0.1379 | 0.8791 | 220 | 0.1482 | | 0.1414 | 0.9590 | 240 | 0.1567 | | 0.1328 | 1.0360 | 260 | 0.1472 | | 0.134 | 1.1159 | 280 | 0.1466 | | 0.1415 | 1.1958 | 300 | 0.1447 | | 0.141 | 1.2757 | 320 | 0.1470 | | 0.1378 | 1.3556 | 340 | 0.1685 | | 0.1425 | 1.4356 | 360 | 0.1560 | | 0.1405 | 1.5155 | 380 | 0.1412 | | 0.135 | 1.5954 | 400 | 0.1512 | | 0.1359 | 1.6753 | 420 | 0.1410 | | 0.1336 | 1.7552 | 440 | 0.1394 | | 0.1317 | 1.8352 | 460 | 0.1408 | | 0.1323 | 1.9151 | 480 | 0.1497 | | 0.1349 | 1.9950 | 500 | 0.1387 | | 0.1204 | 2.0719 | 520 | 0.1407 | | 0.1286 | 2.1518 | 540 | 0.1399 | | 0.1333 | 2.2318 | 560 | 0.1414 | | 0.1315 | 2.3117 | 580 | 0.1398 | | 0.1313 | 2.3916 | 600 | 0.1455 | | 0.1308 | 2.4715 | 620 | 0.1377 | | 0.1327 | 2.5514 | 640 | 0.1400 | | 0.1324 | 2.6314 | 660 | 0.1370 | | 0.1309 | 2.7113 | 680 | 0.1343 | | 0.1274 | 2.7912 | 700 | 0.1384 | | 0.1287 | 2.8711 | 720 | 0.1353 | | 0.1285 | 2.9510 | 740 | 0.1341 | | 0.1256 | 3.0280 | 760 | 0.1380 | | 0.1256 | 3.1079 | 780 | 0.1340 | | 0.1224 | 3.1878 | 800 | 0.1372 | | 0.1244 | 3.2677 | 820 | 0.1358 | | 0.1256 | 3.3477 | 840 | 0.1337 | | 0.1229 | 3.4276 | 860 | 0.1336 | | 0.1252 | 3.5075 | 880 | 0.1333 | | 0.1234 | 3.5874 | 900 | 0.1360 | | 0.1276 | 3.6673 | 920 | 0.1344 | | 0.1258 | 3.7473 | 940 | 0.1327 | | 0.1249 | 3.8272 | 960 | 0.1357 | | 0.1273 | 3.9071 | 980 | 0.1346 | | 0.1266 | 3.9870 | 1000 | 0.1356 | | 0.1172 | 4.0639 | 1020 | 0.1413 | | 0.1236 | 4.1439 | 1040 | 0.1396 | | 0.1219 | 4.2238 | 1060 | 0.1368 | | 0.1187 | 4.3037 | 1080 | 0.1399 | | 0.1225 | 4.3836 | 1100 | 0.1387 | | 0.1243 | 4.4635 | 1120 | 0.1370 | | 0.1218 | 4.5435 | 1140 | 0.1360 | | 0.1189 | 4.6234 | 1160 | 0.1325 | | 0.1185 | 4.7033 | 1180 | 0.1373 | | 0.1251 | 4.7832 | 1200 | 0.1352 | | 0.1214 | 4.8631 | 1220 | 0.1333 | | 0.1225 | 4.9431 | 1240 | 0.1339 | | 0.1138 | 5.0200 | 1260 | 0.1348 | | 0.1205 | 5.0999 | 1280 | 0.1415 | | 0.1208 | 5.1798 | 1300 | 0.1434 | | 0.1165 | 5.2597 | 1320 | 0.1415 | | 0.1154 | 5.3397 | 1340 | 0.1392 | | 0.1143 | 5.4196 | 1360 | 0.1442 | | 0.1165 | 5.4995 | 1380 | 0.1397 | | 0.1162 | 5.5794 | 1400 | 0.1414 | | 0.1148 | 5.6593 | 1420 | 0.1389 | | 0.1133 | 5.7393 | 1440 | 0.1391 | | 0.1145 | 5.8192 | 1460 | 0.1393 | | 0.1152 | 5.8991 | 1480 | 0.1397 | | 0.113 | 5.9790 | 1500 | 0.1407 | | 0.0993 | 6.0559 | 1520 | 0.1625 | | 0.0962 | 6.1359 | 1540 | 0.1609 | | 0.0995 | 6.2158 | 1560 | 0.1573 | | 0.1028 | 6.2957 | 1580 | 0.1582 | | 0.0983 | 6.3756 | 1600 | 0.1620 | | 0.0989 | 6.4555 | 1620 | 0.1572 | | 0.0987 | 6.5355 | 1640 | 0.1602 | | 0.0992 | 6.6154 | 1660 | 0.1593 | | 0.0997 | 6.6953 | 1680 | 0.1644 | | 0.0967 | 6.7752 | 1700 | 0.1630 | | 0.0988 | 6.8551 | 1720 | 0.1596 | | 0.098 | 6.9351 | 1740 | 0.1605 | | 0.0915 | 7.0120 | 1760 | 0.1662 | | 0.0666 | 7.0919 | 1780 | 0.2258 | | 0.0638 | 7.1718 | 1800 | 0.2135 | | 0.0581 | 7.2517 | 1820 | 0.2290 | | 0.065 | 7.3317 | 1840 | 0.2115 | | 0.0611 | 7.4116 | 1860 | 0.2396 | | 0.059 | 7.4915 | 1880 | 0.2205 | | 0.0598 | 7.5714 | 1900 | 0.2314 | | 0.0608 | 7.6513 | 1920 | 0.2309 | | 0.063 | 7.7313 | 1940 | 0.2383 | | 0.0621 | 7.8112 | 1960 | 0.2304 | | 0.0586 | 7.8911 | 1980 | 0.2433 | | 0.0622 | 7.9710 | 2000 | 0.2354 | | 0.0369 | 8.0480 | 2020 | 0.3233 | | 0.0246 | 8.1279 | 2040 | 0.3437 | | 0.022 | 8.2078 | 2060 | 0.3361 | | 0.0243 | 8.2877 | 2080 | 0.3413 | | 0.0235 | 8.3676 | 2100 | 0.3458 | | 0.0229 | 8.4476 | 2120 | 0.3473 | | 0.0218 | 8.5275 | 2140 | 0.3523 | | 0.0234 | 8.6074 | 2160 | 0.3610 | | 0.0228 | 8.6873 | 2180 | 0.3496 | | 0.0221 | 8.7672 | 2200 | 0.3519 | | 0.0223 | 8.8472 | 2220 | 0.3515 | | 0.0224 | 8.9271 | 2240 | 0.3514 | | 0.0193 | 9.0040 | 2260 | 0.3542 | | 0.0081 | 9.0839 | 2280 | 0.4155 | | 0.0071 | 9.1638 | 2300 | 0.4363 | | 0.0065 | 9.2438 | 2320 | 0.4446 | | 0.0057 | 9.3237 | 2340 | 0.4485 | | 0.0064 | 9.4036 | 2360 | 0.4495 | | 0.0071 | 9.4835 | 2380 | 0.4502 | | 0.0058 | 9.5634 | 2400 | 0.4518 | | 0.0066 | 9.6434 | 2420 | 0.4530 | | 0.0072 | 9.7233 | 2440 | 0.4535 | | 0.0064 | 9.8032 | 2460 | 0.4532 | | 0.0076 | 9.8831 | 2480 | 0.4533 | | 0.0063 | 9.9630 | 2500 | 0.4529 | ### Framework versions - Transformers 4.48.2 - Pytorch 2.6.0+cu124 - Datasets 3.4.1 - Tokenizers 0.21.1