See axolotl config
axolotl version: 0.4.1
adapter: lora
auto_find_batch_size: false
base_model: unsloth/Qwen2.5-3B
bf16: auto
chat_template: llama3
dataloader_num_workers: 12
dataset_prepared_path: null
datasets:
- data_files:
- cbfc164a7ba5dfd4_train_data.json
ds_type: json
format: custom
path: /workspace/input_data/cbfc164a7ba5dfd4_train_data.json
type:
field_instruction: input persona
field_output: synthesized text
format: '{instruction}'
no_input_format: '{instruction}'
system_format: '{system}'
system_prompt: ''
debug: null
deepspeed: null
early_stopping_patience: 1000
early_stopping_threshold: 1.0e-07
eval_max_new_tokens: 128
eval_steps: 220
eval_strategy: null
flash_attention: true
fp16: null
fsdp: null
fsdp_config: null
gradient_accumulation_steps: 2
gradient_checkpointing: false
group_by_length: false
hub_model_id: mrferr3t/e9d760b4-d8b5-4635-8832-ae9ca64a1eb9
hub_repo: null
hub_strategy: all_checkpoints
hub_token: null
learning_rate: 0.00015
load_best_model_at_end: true
load_in_4bit: false
load_in_8bit: false
local_rank: null
logging_steps: 176
lora_alpha: 64
lora_dropout: 0.15
lora_fan_in_fan_out: null
lora_model_dir: null
lora_r: 32
lora_target_linear: true
lr_scheduler: cosine
max_grad_norm: 1
max_steps: 17480
micro_batch_size: 4
mlflow_experiment_name: /tmp/cbfc164a7ba5dfd4_train_data.json
model_type: AutoModelForCausalLM
num_epochs: 200
optimizer: adamw_torch_fused
output_dir: miner_id_24
pad_to_sequence_len: true
resume_from_checkpoint: null
s2_attention: null
sample_packing: false
save_steps: 220
save_total_limit: 10
saves_per_epoch: 0
sequence_len: 1024
strict: false
tf32: true
tokenizer_type: AutoTokenizer
train_on_inputs: false
trust_remote_code: true
use_rslora: true
val_set_size: .01886792
wandb_entity: null
wandb_mode: disabled
wandb_name: a18f4630-bcbf-426d-863a-da31c2f7c188
wandb_project: Gradients-On-Demand
wandb_run: your_name
wandb_runid: a18f4630-bcbf-426d-863a-da31c2f7c188
warmup_steps: 1748
weight_decay: 0
xformers_attention: null
e9d760b4-d8b5-4635-8832-ae9ca64a1eb9
This model is a fine-tuned version of unsloth/Qwen2.5-3B on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.7828
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.00015
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 8
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 1748
- training_steps: 17480
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
No log | 0.0002 | 1 | 1.0805 |
0.9316 | 0.0363 | 220 | 0.7945 |
0.7715 | 0.0726 | 440 | 0.7721 |
0.7584 | 0.1089 | 660 | 0.7638 |
0.7335 | 0.1453 | 880 | 0.7561 |
0.7493 | 0.1816 | 1100 | 0.7526 |
0.7347 | 0.2179 | 1320 | 0.7502 |
0.7387 | 0.2542 | 1540 | 0.7501 |
0.7484 | 0.2905 | 1760 | 0.7490 |
0.7402 | 0.3268 | 1980 | 0.7442 |
0.7434 | 0.3631 | 2200 | 0.7446 |
0.7295 | 0.3994 | 2420 | 0.7426 |
0.7302 | 0.4358 | 2640 | 0.7421 |
0.7196 | 0.4721 | 2860 | 0.7412 |
0.7154 | 0.5084 | 3080 | 0.7393 |
0.731 | 0.5447 | 3300 | 0.7378 |
0.7342 | 0.5810 | 3520 | 0.7371 |
0.7227 | 0.6173 | 3740 | 0.7346 |
0.7258 | 0.6536 | 3960 | 0.7341 |
0.7292 | 0.6899 | 4180 | 0.7310 |
0.7278 | 0.7263 | 4400 | 0.7326 |
0.7174 | 0.7626 | 4620 | 0.7306 |
0.7239 | 0.7989 | 4840 | 0.7291 |
0.7149 | 0.8352 | 5060 | 0.7270 |
0.7171 | 0.8715 | 5280 | 0.7275 |
0.7135 | 0.9078 | 5500 | 0.7261 |
0.711 | 0.9441 | 5720 | 0.7245 |
0.7224 | 0.9804 | 5940 | 0.7241 |
0.6513 | 1.0168 | 6160 | 0.7333 |
0.6055 | 1.0531 | 6380 | 0.7333 |
0.6099 | 1.0894 | 6600 | 0.7345 |
0.5935 | 1.1257 | 6820 | 0.7376 |
0.6052 | 1.1620 | 7040 | 0.7367 |
0.5997 | 1.1983 | 7260 | 0.7332 |
0.5992 | 1.2346 | 7480 | 0.7328 |
0.5981 | 1.2709 | 7700 | 0.7314 |
0.5967 | 1.3073 | 7920 | 0.7336 |
0.6002 | 1.3436 | 8140 | 0.7294 |
0.599 | 1.3799 | 8360 | 0.7316 |
0.5949 | 1.4162 | 8580 | 0.7308 |
0.5966 | 1.4525 | 8800 | 0.7291 |
0.6066 | 1.4888 | 9020 | 0.7267 |
0.6041 | 1.5251 | 9240 | 0.7257 |
0.5982 | 1.5614 | 9460 | 0.7287 |
0.591 | 1.5978 | 9680 | 0.7301 |
0.5931 | 1.6341 | 9900 | 0.7236 |
0.6101 | 1.6704 | 10120 | 0.7237 |
0.6068 | 1.7067 | 10340 | 0.7257 |
0.6055 | 1.7430 | 10560 | 0.7213 |
0.6023 | 1.7793 | 10780 | 0.7212 |
0.5991 | 1.8156 | 11000 | 0.7205 |
0.5993 | 1.8519 | 11220 | 0.7205 |
0.5858 | 1.8883 | 11440 | 0.7191 |
0.5958 | 1.9246 | 11660 | 0.7172 |
0.5869 | 1.9609 | 11880 | 0.7187 |
0.5867 | 1.9972 | 12100 | 0.7163 |
0.4452 | 2.0335 | 12320 | 0.7713 |
0.4373 | 2.0698 | 12540 | 0.7774 |
0.447 | 2.1061 | 12760 | 0.7740 |
0.4382 | 2.1424 | 12980 | 0.7827 |
0.442 | 2.1788 | 13200 | 0.7790 |
0.4346 | 2.2151 | 13420 | 0.7822 |
0.4335 | 2.2514 | 13640 | 0.7789 |
0.4393 | 2.2877 | 13860 | 0.7840 |
0.4315 | 2.3240 | 14080 | 0.7841 |
0.4459 | 2.3603 | 14300 | 0.7816 |
0.4324 | 2.3966 | 14520 | 0.7805 |
0.4364 | 2.4329 | 14740 | 0.7841 |
0.4356 | 2.4693 | 14960 | 0.7814 |
0.442 | 2.5056 | 15180 | 0.7814 |
0.4323 | 2.5419 | 15400 | 0.7836 |
0.4393 | 2.5782 | 15620 | 0.7828 |
0.4259 | 2.6145 | 15840 | 0.7836 |
0.4455 | 2.6508 | 16060 | 0.7831 |
0.4318 | 2.6871 | 16280 | 0.7823 |
0.4318 | 2.7234 | 16500 | 0.7825 |
0.4376 | 2.7598 | 16720 | 0.7831 |
0.4377 | 2.7961 | 16940 | 0.7831 |
0.433 | 2.8324 | 17160 | 0.7828 |
0.4449 | 2.8687 | 17380 | 0.7828 |
Framework versions
- PEFT 0.13.2
- Transformers 4.46.0
- Pytorch 2.5.0+cu124
- Datasets 3.0.1
- Tokenizers 0.20.1
- Downloads last month
- 1
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support