See axolotl config

axolotl version: 0.4.1

adapter: lora
auto_find_batch_size: false
base_model: JackFram/llama-68m
bf16: auto
chat_template: llama3
dataloader_num_workers: 12
dataset_prepared_path: null
datasets:
- data_files:
  - 34a002145b99ed0b_train_data.json
  ds_type: json
  format: custom
  path: /workspace/input_data/34a002145b99ed0b_train_data.json
  type:
    field_instruction: problem
    field_output: outputs
    format: '{instruction}'
    no_input_format: '{instruction}'
    system_format: '{system}'
    system_prompt: ''
debug: null
deepspeed: null
early_stopping_patience: 2
early_stopping_threshold: 1.0e-05
eval_max_new_tokens: 128
eval_steps: 200
eval_strategy: null
flash_attention: true
fp16: null
fsdp: null
fsdp_config: null
gradient_accumulation_steps: 8
gradient_checkpointing: false
group_by_length: false
hub_model_id: mrferr3t/ac606e85-1166-41a0-af29-102faa0690eb
hub_repo: null
hub_strategy: checkpoint
hub_token: null
learning_rate: 0.0004
load_in_4bit: false
load_in_8bit: false
local_rank: null
logging_steps: 200
lora_alpha: 16
lora_dropout: 0.05
lora_fan_in_fan_out: null
lora_model_dir: null
lora_r: 8
lora_target_linear: true
lr_scheduler: cosine
max_steps: null
micro_batch_size: 16
mlflow_experiment_name: /tmp/34a002145b99ed0b_train_data.json
model_type: AutoModelForCausalLM
num_epochs: 100
optimizer: adamw_bnb_8bit
output_dir: miner_id_24
pad_to_sequence_len: true
resume_from_checkpoint: null
s2_attention: null
sample_packing: false
save_steps: 200
saves_per_epoch: 0
sequence_len: 512
special_tokens:
  pad_token: </s>
strict: false
tf32: false
tokenizer_type: AutoTokenizer
train_on_inputs: false
trust_remote_code: true
val_set_size: .05000000
wandb_entity: null
wandb_mode: disabled
wandb_name: cb389588-c816-41d5-abb0-cc3edb3cfbc1
wandb_project: Gradients-On-Demand
wandb_run: your_name
wandb_runid: cb389588-c816-41d5-abb0-cc3edb3cfbc1
warmup_steps: 100
weight_decay: 0.0
xformers_attention: null

ac606e85-1166-41a0-af29-102faa0690eb

This model is a fine-tuned version of JackFram/llama-68m on the None dataset. It achieves the following results on the evaluation set:

Loss: 1.7932

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0004
train_batch_size: 16
eval_batch_size: 16
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 128
optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
num_epochs: 100

Training results

Training Loss	Epoch	Step	Validation Loss
No log	0.0010	1	2.9496
2.4435	0.1956	200	2.1728
2.1036	0.3911	400	2.0391
2.015	0.5867	600	1.9774
1.9711	0.7822	800	1.9413
1.9438	0.9778	1000	1.9172
1.9259	1.1733	1200	1.9003
1.9075	1.3689	1400	1.8875
1.9033	1.5644	1600	1.8763
1.8945	1.7600	1800	1.8686
1.8885	1.9555	2000	1.8613
1.8805	2.1511	2200	1.8551
1.8774	2.3466	2400	1.8521
1.8669	2.5422	2600	1.8463
1.8669	2.7377	2800	1.8433
1.8675	2.9333	3000	1.8386
1.8681	3.1288	3200	1.8362
1.8561	3.3244	3400	1.8336
1.8597	3.5199	3600	1.8300
1.8493	3.7155	3800	1.8275
1.8551	3.9110	4000	1.8256
1.8518	4.1066	4200	1.8242
1.85	4.3021	4400	1.8218
1.8444	4.4977	4600	1.8207
1.8457	4.6932	4800	1.8184
1.8481	4.8888	5000	1.8172
1.8483	5.0843	5200	1.8160
1.8429	5.2799	5400	1.8148
1.8405	5.4754	5600	1.8138
1.8399	5.6710	5800	1.8129
1.8422	5.8665	6000	1.8111
1.8433	6.0621	6200	1.8107
1.8364	6.2576	6400	1.8087
1.8387	6.4532	6600	1.8079
1.8329	6.6487	6800	1.8081
1.8379	6.8443	7000	1.8074
1.839	7.0398	7200	1.8057
1.8344	7.2354	7400	1.8055
1.8344	7.4309	7600	1.8047
1.8377	7.6265	7800	1.8039
1.8333	7.8220	8000	1.8031
1.8355	8.0176	8200	1.8020
1.8271	8.2132	8400	1.8021
1.8367	8.4087	8600	1.8018
1.8315	8.6043	8800	1.8011
1.8346	8.7998	9000	1.8005
1.8256	8.9954	9200	1.7994
1.8342	9.1909	9400	1.7996
1.8286	9.3865	9600	1.7992
1.8315	9.5820	9800	1.7981
1.8284	9.7776	10000	1.7977
1.8264	9.9731	10200	1.7967
1.8314	10.1687	10400	1.7966
1.8268	10.3642	10600	1.7961
1.8279	10.5598	10800	1.7963
1.8211	10.7553	11000	1.7952
1.8288	10.9509	11200	1.7949
1.8307	11.1464	11400	1.7949
1.8227	11.3420	11600	1.7945
1.8282	11.5375	11800	1.7944
1.8243	11.7331	12000	1.7929
1.8265	11.9286	12200	1.7931
1.8278	12.1242	12400	1.7932

Framework versions

PEFT 0.13.2
Transformers 4.46.0
Pytorch 2.5.0+cu124
Datasets 3.0.1
Tokenizers 0.20.1

mrferr3t
/

ac606e85-1166-41a0-af29-102faa0690eb

ac606e85-1166-41a0-af29-102faa0690eb

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for mrferr3t/ac606e85-1166-41a0-af29-102faa0690eb

Evaluation results