See axolotl config
axolotl version: 0.10.0.dev0
base_model: arcee-train/afm-64k-v1
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
trust_remote_code: true
# wandb configuration
wandb_project: AFM-SFT
wandb_watch:
wandb_run_id: # V{Version}-{Run Number}-{Attempt Number}
wandb_log_model:
# push checkpoints to hub
hub_model_id: pocketdoc-interim/AFM-SFT-v0.0.1
# how to push checkpoints to hub
# https://huggingface.co/docs/transformers/v4.31.0/en/main_classes/trainer#transformers.TrainingArguments.hub_strategy
hub_strategy: "every_save"
# Whether to use hf `use_auth_token` for loading datasets. Useful for fetching private datasets
# Required to be true when used in combination with `push_dataset_to_hub`
hf_use_auth_token: true
# where to save the finished model to
output_dir: ./AFM-SFT-v0.0.1
# where to save the dataset to
dataset_prepared_path: ./AFM-SFT-v0.0.1-data
save_safetensors: true
# dataset settings (local or huggingface repo)
datasets:
- path: pocketdoc-interim/AFM-SFT-64K
split: train
ds_type: parquet
type:
- path: pocketdoc-interim/AFM-SFT-64K
split: validation
ds_type: parquet
type:
plugins:
- axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
cut_cross_entropy: true
load_in_8bit: false
load_in_4bit: false
strict: false
sequence_len: 32768
sample_packing: true
eval_sample_packing: true
val_set_size: 0.005
pad_to_sequence_len: true
gradient_checkpointing: true
gradient_accumulation_steps: 4
micro_batch_size: 1
num_epochs: 2
optimizer: ademamix_8bit
optim_args: "beta1=0.9,beta2=0.999,beta3=0.999,alpha=5"
lr_scheduler: rex
learning_rate: 0.000001
cosine_min_lr_ratio:
weight_decay: 0.0
max_grad_norm: 0.001
train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
tf32: false
early_stopping_patience:
resume_from_checkpoint:
auto_resume_from_checkpoints: false
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
warmup_ratio: 0.05
evals_per_epoch: 20
# eval_steps: 20
saves_per_epoch: 4
save_total_limit: 1
debug: false
deepspeed: deepspeed_configs/zero3_bf16.json
torch_compile: false
fsdp:
fsdp_config:
special_tokens:
pad_token: "<|finetune_right_pad_id|>"
AFM-SFT-v0.0.1
This model is a fine-tuned version of arcee-train/afm-64k-v1 on the pocketdoc-interim/AFM-SFT-64K and the pocketdoc-interim/AFM-SFT-64K datasets. It achieves the following results on the evaluation set:
- Loss: 1.3071
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- total_eval_batch_size: 8
- optimizer: Use OptimizerNames.ADEMAMIX_8BIT and the args are: beta1=0.9,beta2=0.999,beta3=0.999,alpha=5
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 69
- training_steps: 1394
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
3.8443 | 0.0014 | 1 | 4.3592 |
3.7407 | 0.0502 | 35 | 4.1027 |
2.9405 | 0.1004 | 70 | 2.8520 |
2.0344 | 0.1506 | 105 | 2.1397 |
1.8346 | 0.2008 | 140 | 1.9007 |
1.7566 | 0.2510 | 175 | 1.7788 |
1.6076 | 0.3012 | 210 | 1.6968 |
1.6816 | 0.3514 | 245 | 1.6405 |
1.5809 | 0.4016 | 280 | 1.5957 |
1.4863 | 0.4518 | 315 | 1.5635 |
1.4724 | 0.5020 | 350 | 1.5370 |
1.3741 | 0.5522 | 385 | 1.5092 |
1.5398 | 0.6024 | 420 | 1.4898 |
1.4832 | 0.6526 | 455 | 1.4698 |
1.3365 | 0.7028 | 490 | 1.4528 |
1.3616 | 0.7530 | 525 | 1.4382 |
1.594 | 0.8032 | 560 | 1.4259 |
1.4095 | 0.8534 | 595 | 1.4155 |
1.3814 | 0.9035 | 630 | 1.4050 |
1.4576 | 0.9537 | 665 | 1.3952 |
1.361 | 1.0029 | 700 | 1.3872 |
1.3922 | 1.0531 | 735 | 1.3791 |
1.4289 | 1.1033 | 770 | 1.3711 |
1.2603 | 1.1535 | 805 | 1.3643 |
1.3714 | 1.2037 | 840 | 1.3579 |
1.2902 | 1.2539 | 875 | 1.3517 |
1.3005 | 1.3041 | 910 | 1.3461 |
1.4258 | 1.3542 | 945 | 1.3411 |
1.3047 | 1.4044 | 980 | 1.3367 |
1.2701 | 1.4546 | 1015 | 1.3325 |
1.2929 | 1.5048 | 1050 | 1.3291 |
1.3233 | 1.5550 | 1085 | 1.3251 |
1.1852 | 1.6052 | 1120 | 1.3221 |
1.3357 | 1.6554 | 1155 | 1.3192 |
1.2157 | 1.7056 | 1190 | 1.3166 |
1.324 | 1.7558 | 1225 | 1.3139 |
1.2716 | 1.8060 | 1260 | 1.3117 |
1.2703 | 1.8562 | 1295 | 1.3099 |
1.2428 | 1.9064 | 1330 | 1.3084 |
1.2295 | 1.9566 | 1365 | 1.3071 |
Framework versions
- Transformers 4.51.3
- Pytorch 2.6.0+cu124
- Datasets 3.5.1
- Tokenizers 0.21.1
- Downloads last month
- 0
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for pocketdoc-interim/AFM-SFT-v0.0.1
Base model
arcee-train/afm-64k-v1