See axolotl config

axolotl version: 0.4.1

adapter: lora
auto_resume_from_checkpoints: false
base_model: EleutherAI/pythia-160m
bf16: false
chat_template: llama3
dataset_prepared_path: null
datasets:
- data_files:
  - 5cf4bc9f2d02ce29_train_data.json
  ds_type: json
  format: custom
  path: /workspace/input_data/5cf4bc9f2d02ce29_train_data.json
  type:
    field_instruction: Description
    field_output: Product Name
    format: '{instruction}'
    no_input_format: '{instruction}'
    system_format: '{system}'
    system_prompt: ''
debug: null
deepspeed: null
early_stopping_patience: 4
eval_max_new_tokens: 128
eval_steps: 100
eval_table_size: null
flash_attention: false
fp16: true
fsdp: null
fsdp_config: null
gradient_accumulation_steps: 4
gradient_checkpointing: true
group_by_length: false
hub_model_id: error577/d71d5b54-5e85-4947-9c78-097fdffe44b5
hub_repo: null
hub_strategy: checkpoint
hub_token: null
learning_rate: 0.0002
load_in_4bit: false
load_in_8bit: false
local_rank: null
logging_steps: 1
lora_alpha: 128
lora_dropout: 0.05
lora_fan_in_fan_out: null
lora_model_dir: null
lora_r: 64
lora_target_linear: true
lr_scheduler: cosine
max_grad_norm: 1.0
max_steps: null
micro_batch_size: 4
mlflow_experiment_name: /tmp/5cf4bc9f2d02ce29_train_data.json
model_type: AutoModelForCausalLM
num_epochs: 10
optimizer: adamw_torch
output_dir: miner_id_24
pad_to_sequence_len: true
resume_from_checkpoint: null
s2_attention: null
sample_packing: false
save_steps: 100
sequence_len: 512
special_tokens:
  pad_token: <|endoftext|>
strict: false
tf32: false
tokenizer_type: AutoTokenizer
train_on_inputs: false
trust_remote_code: true
val_set_size: 0.005
wandb_entity: null
wandb_mode: online
wandb_name: 451a522d-4518-49f1-886b-2a8292a7075c
wandb_project: Gradients-On-Demand
wandb_run: your_name
wandb_runid: 451a522d-4518-49f1-886b-2a8292a7075c
warmup_steps: 30
weight_decay: 0.0
xformers_attention: null

d71d5b54-5e85-4947-9c78-097fdffe44b5

This model is a fine-tuned version of EleutherAI/pythia-160m on the None dataset. It achieves the following results on the evaluation set:

Loss: 2.2954

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 16
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 30
num_epochs: 10
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
21.8113	0.0005	1	5.2113
15.1235	0.0487	100	3.9548
14.7982	0.0974	200	3.8197
13.7443	0.1461	300	3.6183
15.704	0.1948	400	3.6420
13.8042	0.2435	500	3.4720
13.2007	0.2921	600	3.4392
13.0558	0.3408	700	3.4302
16.1464	0.3895	800	3.3322
14.4292	0.4382	900	3.2434
13.7315	0.4869	1000	3.2245
15.2213	0.5356	1100	3.2073
13.5114	0.5843	1200	3.2190
13.1945	0.6330	1300	3.1604
11.7011	0.6817	1400	3.0631
11.1894	0.7304	1500	3.0505
15.3082	0.7791	1600	3.0188
14.975	0.8278	1700	2.9847
11.4546	0.8764	1800	3.0346
14.6575	0.9251	1900	3.0584
12.4975	0.9738	2000	2.9857
12.8715	1.0226	2100	2.9691
12.3798	1.0713	2200	2.9477
12.1667	1.1200	2300	2.9389
13.4105	1.1687	2400	2.9246
10.8388	1.2174	2500	2.9197
11.7186	1.2661	2600	2.9452
9.4794	1.3148	2700	2.9325
11.2981	1.3635	2800	2.8855
11.3424	1.4122	2900	2.9009
13.1606	1.4609	3000	3.0128
11.5901	1.5096	3100	2.8463
13.8817	1.5582	3200	2.8352
10.3641	1.6069	3300	2.8067
11.7904	1.6556	3400	2.7993
13.4772	1.7043	3500	2.7733
13.1556	1.7530	3600	2.7185
14.1425	1.8017	3700	2.7170
12.8444	1.8504	3800	2.7320
14.8567	1.8991	3900	2.7395
10.6849	1.9478	4000	2.6872
11.225	1.9965	4100	2.6622
13.2007	2.0453	4200	2.6121
12.02	2.0940	4300	2.6490
11.8255	2.1427	4400	2.6439
8.9377	2.1914	4500	2.6369
13.2196	2.2400	4600	2.5585
9.1704	2.2887	4700	2.5552
14.5735	2.3374	4800	2.5070
9.7886	2.3861	4900	2.4726
9.4464	2.4348	5000	2.4289
8.929	2.4835	5100	2.4039
10.4566	2.5322	5200	2.3228
10.6264	2.5809	5300	2.3262
9.2416	2.6296	5400	2.2823
7.1439	2.6783	5500	2.3679
8.2611	2.7270	5600	2.3297
8.7267	2.7757	5700	2.2923
8.5845	2.8243	5800	2.2954

Framework versions

PEFT 0.13.2
Transformers 4.46.0
Pytorch 2.5.0+cu124
Datasets 3.0.1
Tokenizers 0.20.1

error577
/

d71d5b54-5e85-4947-9c78-097fdffe44b5

d71d5b54-5e85-4947-9c78-097fdffe44b5

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for error577/d71d5b54-5e85-4947-9c78-097fdffe44b5

Evaluation results