See axolotl config

axolotl version: 0.9.2

base_model: Qwen/Qwen3-0.6B-Base
# Automatically upload checkpoint and final model to HF
# hub_model_id: username/custom_model_name

plugins:
  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
strict: false

chat_template: qwen3
datasets:
  - path: timarni/MNLP_M3_mcqa_dataset
    name: stem_instruction_tuning_hard
    type: alpaca
    split: train

val_set_size: 0.1
output_dir: ./outputs/base_it_hard
dataset_prepared_path: last_run_prepared

sequence_len: 2048 # 4096
sample_packing: true
eval_sample_packing: true
pad_to_sequence_len: true

# To be sure that no LORA is done
adapter: null
lora: false
merge_lora: false

wandb_project: mnlp_project
wandb_entity: tim-arni
wandb_watch:
wandb_name: base_it_hard
wandb_log_model:

gradient_accumulation_steps: 4 # 2
micro_batch_size: 2 # 1
num_epochs: 5
optimizer: adamw_torch
lr_scheduler: cosine
learning_rate: 0.00001 # 0.00005
cosine_min_lr_ratio: 0.1

bf16: auto
tf32: true

gradient_checkpointing: offload
gradient_checkpointing_kwargs:
  use_reentrant: false
resume_from_checkpoint:
logging_steps: 1
flash_attention: true

warmup_ratio: 0.05
evals_per_epoch: 4
saves_per_epoch: 2
save_total_limit: 10
weight_decay: 0.01
special_tokens:

outputs/base_it_hard

This model is a fine-tuned version of Qwen/Qwen3-0.6B-Base on the timarni/MNLP_M3_mcqa_dataset dataset. It achieves the following results on the evaluation set:

Loss: 4.5354

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
distributed_type: multi-GPU
num_devices: 2
gradient_accumulation_steps: 4
total_train_batch_size: 16
total_eval_batch_size: 4
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 45
num_epochs: 5.0

Training results

Training Loss	Epoch	Step	Validation Loss
0.8271	0.0055	1	6.2702
0.1398	0.2490	45	4.7948
0.1439	0.4979	90	4.3628
0.1377	0.7469	135	4.2137
0.1436	0.9959	180	4.2396
0.1086	1.2434	225	4.2662
0.1018	1.4924	270	4.3334
0.1226	1.7414	315	4.3240
0.13	1.9903	360	4.3957
0.1269	2.2379	405	4.3869
0.11	2.4869	450	4.4244
0.1081	2.7358	495	4.4782
0.1139	2.9848	540	4.5098
0.1041	3.2324	585	4.4869
0.1052	3.4813	630	4.5032
0.1143	3.7303	675	4.5032
0.1144	3.9793	720	4.5265
0.104	4.2268	765	4.5161
0.1343	4.4758	810	4.5280
0.1217	4.7248	855	4.5158
0.1158	4.9737	900	4.5354

Framework versions

Transformers 4.51.3
Pytorch 2.5.1+cu121
Datasets 3.5.1
Tokenizers 0.21.1

Downloads last month: 5

Safetensors

Model size

0.6B params

Tensor type

BF16

Model tree for timarni/base_it_hard_180

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

(493)

this model

timarni
/

base_it_hard_180

outputs/base_it_hard

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for timarni/base_it_hard_180

Dataset used to train timarni/base_it_hard_180

Evaluation results