cooking_sft_fail_new_mem

This model is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct on the identity and the cooking_sft_fail_new_mem datasets. It achieves the following results on the evaluation set:

  • Loss: 0.2036

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • total_eval_batch_size: 8
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss
0.3821 0.0133 50 0.4735
0.302 0.0267 100 0.3178
0.2988 0.0400 150 0.3253
0.3054 0.0533 200 0.3250
0.2967 0.0666 250 0.3232
0.3137 0.0800 300 0.3207
0.3221 0.0933 350 0.3211
0.3188 0.1066 400 0.3204
0.308 0.1200 450 0.3149
0.3123 0.1333 500 0.3106
0.3138 0.1466 550 0.3050
0.3032 0.1600 600 0.3046
0.2827 0.1733 650 0.3017
0.2953 0.1866 700 0.2970
0.2854 0.1999 750 0.2924
0.2872 0.2133 800 0.2896
0.2866 0.2266 850 0.2836
0.2925 0.2399 900 0.2794
0.2843 0.2533 950 0.2823
0.292 0.2666 1000 0.2789
0.2775 0.2799 1050 0.2763
0.2652 0.2933 1100 0.2717
0.27 0.3066 1150 0.2712
0.277 0.3199 1200 0.2749
0.2681 0.3332 1250 0.2709
0.2699 0.3466 1300 0.2718
0.2682 0.3599 1350 0.2676
0.2668 0.3732 1400 0.2662
0.2615 0.3866 1450 0.2689
0.2501 0.3999 1500 0.2583
0.2545 0.4132 1550 0.2568
0.2618 0.4265 1600 0.2523
0.2615 0.4399 1650 0.2550
0.2512 0.4532 1700 0.2488
0.245 0.4665 1750 0.2504
0.2503 0.4799 1800 0.2481
0.2402 0.4932 1850 0.2450
0.2346 0.5065 1900 0.2440
0.2413 0.5199 1950 0.2425
0.24 0.5332 2000 0.2383
0.2398 0.5465 2050 0.2408
0.2473 0.5598 2100 0.2384
0.2423 0.5732 2150 0.2348
0.2294 0.5865 2200 0.2311
0.2403 0.5998 2250 0.2323
0.2319 0.6132 2300 0.2297
0.222 0.6265 2350 0.2288
0.2193 0.6398 2400 0.2303
0.2252 0.6531 2450 0.2247
0.2304 0.6665 2500 0.2211
0.2139 0.6798 2550 0.2199
0.2186 0.6931 2600 0.2192
0.2156 0.7065 2650 0.2183
0.2187 0.7198 2700 0.2159
0.222 0.7331 2750 0.2174
0.2162 0.7465 2800 0.2153
0.2253 0.7598 2850 0.2132
0.2066 0.7731 2900 0.2134
0.2113 0.7864 2950 0.2107
0.2107 0.7998 3000 0.2085
0.2055 0.8131 3050 0.2097
0.2045 0.8264 3100 0.2075
0.2172 0.8398 3150 0.2062
0.2138 0.8531 3200 0.2075
0.194 0.8664 3250 0.2051
0.2133 0.8798 3300 0.2051
0.2025 0.8931 3350 0.2047
0.2088 0.9064 3400 0.2050
0.204 0.9197 3450 0.2044
0.2059 0.9331 3500 0.2039
0.2103 0.9464 3550 0.2039
0.2102 0.9597 3600 0.2039
0.2051 0.9731 3650 0.2038
0.2017 0.9864 3700 0.2037
0.2088 0.9997 3750 0.2037

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.5.1+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
5
Safetensors
Model size
8.03B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for izzcw/cooking_sft_fail_new_mem

Finetuned
(1682)
this model