You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Built with Axolotl

See axolotl config

axolotl version: 0.10.0.dev0

base_model: Qwen/Qwen3-0.6B-Base
hub_model_id: cyberbabooshka/base_noreasoning2
wandb_name: base_noreasoning2

tokenizer_type: AutoTokenizer
load_in_8bit: false
load_in_4bit: false

num_processes: 64
dataset_processes: 64
dataset_prepared_path: last_run_prepared

chat_template: jinja
chat_template_jinja: >-
  {%- for message in messages %}
    {{- '<|im_start|>' + message.role + '\n' + message.content.lstrip('\n') + '<|im_end|>' + '\n' }}
  {%- endfor %}
  {%- if add_generation_prompt %}
    {{- '<|im_start|>assistant\n' }}
  {%- endif %}

datasets:
  - path: cyberbabooshka/MNLP_M2_mcqa_dataset
    split: train
    type: chat_template
    field_messages: messages
    train_on_eos: turn
    train_on_eot: turn
    message_property_mappings:
      role: role
      content: content
    roles:
      user:
        - user
      assistant:
        - assistant

test_datasets:
  - path: cyberbabooshka/MNLP_M2_mcqa_dataset
    split: test
    type: chat_template
    field_messages: messages
    train_on_eos: turn
    train_on_eot: turn
    message_property_mappings:
      role: role
      content: content
    roles:
      user:
        - user
      assistant:
        - assistant

output_dir: ./outputs

sequence_len: 2048
batch_flattening: true
sample_packing: false

wandb_project: mnlp
wandb_entity: aleksandr-dremov-epfl
wandb_watch:
wandb_log_model:

gradient_accumulation_steps: 1
eval_batch_size: 16
micro_batch_size: 16

optimizer: ademamix_8bit
weight_decay: 0.01

learning_rate: 0.00005
warmup_steps: 500

wsd_final_lr_factor: 0.0
wsd_init_div_factor: 100
wsd_fract_decay: 0.2
wsd_decay_type: "sqrt"
wsd_sqrt_power: 0.5
wsd_cooldown_start_lr_factor: 1.0

bf16: auto
tf32: false

torch_compile: true
flash_attention: true
gradient_checkpointing: false

resume_from_checkpoint:
auto_resume_from_checkpoints: true

logging_steps: 16
eval_steps: 2000
save_steps: 1000
max_steps: 10000000
num_epochs: 2
save_total_limit: 2

special_tokens:
  eos_token: "<|im_end|>"
  pad_token: "<|endoftext|>"

eot_tokens:
  - <|im_end|>

plugins:
  - axolotl_wsd.WSDSchedulerPlugin

base_noreasoning2

This model is a fine-tuned version of Qwen/Qwen3-0.6B-Base on the cyberbabooshka/MNLP_M2_mcqa_dataset dataset. It achieves the following results on the evaluation set:

  • Loss: 0.7643

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use OptimizerNames.ADEMAMIX_8BIT and the args are: No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 500
  • training_steps: 55761

Training results

Training Loss Epoch Step Validation Loss
No log 0.0000 1 0.9809
0.8514 0.0717 2000 0.8555
0.8304 0.1435 4000 0.8445
0.8246 0.2152 6000 0.8379
0.8393 0.2869 8000 0.8315
0.8216 0.3587 10000 0.8278
0.848 0.4304 12000 0.8235
0.8166 0.5021 14000 0.8206
0.8344 0.5739 16000 0.8175
0.827 0.6456 18000 0.8140
0.8287 0.7173 20000 0.8123
0.8234 0.7891 22000 0.8089
0.7958 0.8608 24000 0.8076
0.825 0.9325 26000 0.8057
0.7736 1.0043 28000 0.8037
0.7528 1.0760 30000 0.8037
0.8106 1.1477 32000 0.8021
0.7787 1.2195 34000 0.8016
0.7483 1.2912 36000 0.8008
0.7535 1.3629 38000 0.7993
0.7994 1.4347 40000 0.7987
0.7475 1.5064 42000 0.7982
0.7844 1.5781 44000 0.7970
0.7743 1.5999 44608 0.7964
0.7672 1.6499 46000 0.7807
0.7147 1.7216 48000 0.7714
0.7784 1.7933 50000 0.7670
0.7582 1.8651 52000 0.7650
0.7778 1.9368 54000 0.7643

Framework versions

  • Transformers 4.52.1
  • Pytorch 2.7.0+cu126
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
9
Safetensors
Model size
596M params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cyberbabooshka/base_noreasoning2

Finetuned
(284)
this model

Dataset used to train cyberbabooshka/base_noreasoning2