See axolotl config
axolotl version: 0.9.2
base_model: Qwen/Qwen3-0.6B-Base
# Automatically upload checkpoint and final model to HF
# hub_model_id: username/custom_model_name
plugins:
- axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
strict: false
chat_template: qwen3
datasets:
- path: timarni/MNLP_M2_mcqa_dataset
type: alpaca
split: train
shuffle_merged_datasets: true
val_set_size: 0.1
output_dir: ./outputs/base_test_set
dataset_prepared_path: last_run_prepared
sequence_len: 4096 #2048
sample_packing: true # was true -> need to check if it actually learns on the samples or not (better understand te hyperparam and event. install axolotl to debug)
eval_sample_packing: false
pad_to_sequence_len: true
# train_on_inputs: true # NEW
# group_by_length: false NEW?
# To be sure that no LORA is done
adapter: null
lora: false
merge_lora: false
wandb_project: mnlp_project
wandb_entity: tim-arni
wandb_watch:
wandb_name: base_test_set
wandb_log_model:
gradient_accumulation_steps: 16 # 2
micro_batch_size: 2 # 1
num_epochs: 25
optimizer: adamw_torch
lr_scheduler: cosine
learning_rate: 0.00005 # 0.00005
# cosine_min_lr_ratio: 0.1
warmup_ratio: 0.05
weight_decay: 0.01
bf16: auto
tf32: true
gradient_checkpointing: offload
gradient_checkpointing_kwargs:
use_reentrant: false
resume_from_checkpoint:
logging_steps: 1
gradient_clipping: 1.0 # or max_grad_norm?
flash_attention: true
evals_per_epoch: 4
saves_per_epoch: 2
save_total_limit: 25
special_tokens:
outputs/base_test_set
This model is a fine-tuned version of Qwen/Qwen3-0.6B-Base on the timarni/MNLP_M2_mcqa_dataset dataset. It achieves the following results on the evaluation set:
- Loss: 0.2652
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 16
- total_train_batch_size: 32
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 2
- num_epochs: 25.0
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
0.4926 | 0.6957 | 1 | 0.6350 |
0.4971 | 1.0 | 2 | 0.1976 |
0.136 | 1.6957 | 3 | 0.1792 |
0.112 | 2.0 | 4 | 0.2161 |
0.1589 | 2.6957 | 5 | 0.1613 |
0.1186 | 3.0 | 6 | 0.1703 |
0.0949 | 3.6957 | 7 | 0.1849 |
0.0879 | 4.0 | 8 | 0.1670 |
0.0739 | 4.6957 | 9 | 0.1571 |
0.0654 | 5.0 | 10 | 0.1650 |
0.0565 | 5.6957 | 11 | 0.1853 |
0.0501 | 6.0 | 12 | 0.2105 |
0.0405 | 6.6957 | 13 | 0.2340 |
0.0393 | 7.0 | 14 | 0.2389 |
0.031 | 7.6957 | 15 | 0.2398 |
0.0238 | 8.0 | 16 | 0.2427 |
0.023 | 8.6957 | 17 | 0.2465 |
0.0207 | 9.0 | 18 | 0.2538 |
0.0182 | 9.6957 | 19 | 0.2618 |
0.0217 | 10.0 | 20 | 0.2641 |
0.0172 | 10.6957 | 21 | 0.2640 |
0.0189 | 11.0 | 22 | 0.2685 |
0.0167 | 11.6957 | 23 | 0.2686 |
0.0184 | 12.0 | 24 | 0.2665 |
0.0158 | 12.6957 | 25 | 0.2652 |
Framework versions
- Transformers 4.51.3
- Pytorch 2.5.1+cu121
- Datasets 3.5.1
- Tokenizers 0.21.1
- Downloads last month
- 8
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for timarni/base_test_set_9
Base model
Qwen/Qwen3-0.6B-Base