Built with Axolotl

See axolotl config

axolotl version: 0.4.1

adapter: lora
auto_resume_from_checkpoints: false
base_model: EleutherAI/pythia-160m
bf16: false
chat_template: llama3
dataset_prepared_path: null
datasets:
- data_files:
  - 5cf4bc9f2d02ce29_train_data.json
  ds_type: json
  format: custom
  path: /workspace/input_data/5cf4bc9f2d02ce29_train_data.json
  type:
    field_instruction: Description
    field_output: Product Name
    format: '{instruction}'
    no_input_format: '{instruction}'
    system_format: '{system}'
    system_prompt: ''
debug: null
deepspeed: null
early_stopping_patience: 4
eval_max_new_tokens: 128
eval_steps: 100
eval_table_size: null
flash_attention: false
fp16: true
fsdp: null
fsdp_config: null
gradient_accumulation_steps: 4
gradient_checkpointing: true
group_by_length: false
hub_model_id: error577/d71d5b54-5e85-4947-9c78-097fdffe44b5
hub_repo: null
hub_strategy: checkpoint
hub_token: null
learning_rate: 0.0002
load_in_4bit: false
load_in_8bit: false
local_rank: null
logging_steps: 1
lora_alpha: 128
lora_dropout: 0.05
lora_fan_in_fan_out: null
lora_model_dir: null
lora_r: 64
lora_target_linear: true
lr_scheduler: cosine
max_grad_norm: 1.0
max_steps: null
micro_batch_size: 4
mlflow_experiment_name: /tmp/5cf4bc9f2d02ce29_train_data.json
model_type: AutoModelForCausalLM
num_epochs: 10
optimizer: adamw_torch
output_dir: miner_id_24
pad_to_sequence_len: true
resume_from_checkpoint: null
s2_attention: null
sample_packing: false
save_steps: 100
sequence_len: 512
special_tokens:
  pad_token: <|endoftext|>
strict: false
tf32: false
tokenizer_type: AutoTokenizer
train_on_inputs: false
trust_remote_code: true
val_set_size: 0.005
wandb_entity: null
wandb_mode: online
wandb_name: 451a522d-4518-49f1-886b-2a8292a7075c
wandb_project: Gradients-On-Demand
wandb_run: your_name
wandb_runid: 451a522d-4518-49f1-886b-2a8292a7075c
warmup_steps: 30
weight_decay: 0.0
xformers_attention: null

d71d5b54-5e85-4947-9c78-097fdffe44b5

This model is a fine-tuned version of EleutherAI/pythia-160m on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 2.2954

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 30
  • num_epochs: 10
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
21.8113 0.0005 1 5.2113
15.1235 0.0487 100 3.9548
14.7982 0.0974 200 3.8197
13.7443 0.1461 300 3.6183
15.704 0.1948 400 3.6420
13.8042 0.2435 500 3.4720
13.2007 0.2921 600 3.4392
13.0558 0.3408 700 3.4302
16.1464 0.3895 800 3.3322
14.4292 0.4382 900 3.2434
13.7315 0.4869 1000 3.2245
15.2213 0.5356 1100 3.2073
13.5114 0.5843 1200 3.2190
13.1945 0.6330 1300 3.1604
11.7011 0.6817 1400 3.0631
11.1894 0.7304 1500 3.0505
15.3082 0.7791 1600 3.0188
14.975 0.8278 1700 2.9847
11.4546 0.8764 1800 3.0346
14.6575 0.9251 1900 3.0584
12.4975 0.9738 2000 2.9857
12.8715 1.0226 2100 2.9691
12.3798 1.0713 2200 2.9477
12.1667 1.1200 2300 2.9389
13.4105 1.1687 2400 2.9246
10.8388 1.2174 2500 2.9197
11.7186 1.2661 2600 2.9452
9.4794 1.3148 2700 2.9325
11.2981 1.3635 2800 2.8855
11.3424 1.4122 2900 2.9009
13.1606 1.4609 3000 3.0128
11.5901 1.5096 3100 2.8463
13.8817 1.5582 3200 2.8352
10.3641 1.6069 3300 2.8067
11.7904 1.6556 3400 2.7993
13.4772 1.7043 3500 2.7733
13.1556 1.7530 3600 2.7185
14.1425 1.8017 3700 2.7170
12.8444 1.8504 3800 2.7320
14.8567 1.8991 3900 2.7395
10.6849 1.9478 4000 2.6872
11.225 1.9965 4100 2.6622
13.2007 2.0453 4200 2.6121
12.02 2.0940 4300 2.6490
11.8255 2.1427 4400 2.6439
8.9377 2.1914 4500 2.6369
13.2196 2.2400 4600 2.5585
9.1704 2.2887 4700 2.5552
14.5735 2.3374 4800 2.5070
9.7886 2.3861 4900 2.4726
9.4464 2.4348 5000 2.4289
8.929 2.4835 5100 2.4039
10.4566 2.5322 5200 2.3228
10.6264 2.5809 5300 2.3262
9.2416 2.6296 5400 2.2823
7.1439 2.6783 5500 2.3679
8.2611 2.7270 5600 2.3297
8.7267 2.7757 5700 2.2923
8.5845 2.8243 5800 2.2954

Framework versions

  • PEFT 0.13.2
  • Transformers 4.46.0
  • Pytorch 2.5.0+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for error577/d71d5b54-5e85-4947-9c78-097fdffe44b5

Adapter
(186)
this model