You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Built with Axolotl

See axolotl config

axolotl version: 0.10.0.dev0

base_model: arcee-train/afm-64k-v1
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer

trust_remote_code: true

# wandb configuration
wandb_project: AFM-SFT
wandb_watch:

wandb_run_id: # V{Version}-{Run Number}-{Attempt Number}
wandb_log_model:

# push checkpoints to hub
hub_model_id: pocketdoc-interim/AFM-SFT-v0.0.1
# how to push checkpoints to hub
# https://huggingface.co/docs/transformers/v4.31.0/en/main_classes/trainer#transformers.TrainingArguments.hub_strategy
hub_strategy: "every_save"
# Whether to use hf `use_auth_token` for loading datasets. Useful for fetching private datasets
# Required to be true when used in combination with `push_dataset_to_hub`
hf_use_auth_token: true

# where to save the finished model to
output_dir: ./AFM-SFT-v0.0.1

# where to save the dataset to
dataset_prepared_path: ./AFM-SFT-v0.0.1-data

save_safetensors: true

# dataset settings (local or huggingface repo)
datasets:
  - path: pocketdoc-interim/AFM-SFT-64K
    split: train
    ds_type: parquet
    type:
  - path: pocketdoc-interim/AFM-SFT-64K
    split: validation
    ds_type: parquet
    type:


plugins:
  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
cut_cross_entropy: true


load_in_8bit: false
load_in_4bit: false
strict: false


sequence_len: 32768

sample_packing: true
eval_sample_packing: true

val_set_size: 0.005

pad_to_sequence_len: true

gradient_checkpointing: true

gradient_accumulation_steps: 4
micro_batch_size: 1

num_epochs: 2

optimizer: ademamix_8bit
optim_args: "beta1=0.9,beta2=0.999,beta3=0.999,alpha=5"

lr_scheduler: rex
learning_rate: 0.000001
cosine_min_lr_ratio:

weight_decay: 0.0
max_grad_norm: 0.001

train_on_inputs: false
group_by_length: false

bf16: true
fp16: false
tf32: false

early_stopping_patience:

resume_from_checkpoint:
auto_resume_from_checkpoints: false

local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_ratio: 0.05

evals_per_epoch: 20
# eval_steps: 20

saves_per_epoch: 4
save_total_limit: 1

debug: false

deepspeed: deepspeed_configs/zero3_bf16.json
torch_compile: false

fsdp:
fsdp_config:

special_tokens:
  pad_token: "<|finetune_right_pad_id|>"

AFM-SFT-v0.0.1

This model is a fine-tuned version of arcee-train/afm-64k-v1 on the pocketdoc-interim/AFM-SFT-64K and the pocketdoc-interim/AFM-SFT-64K datasets. It achieves the following results on the evaluation set:

  • Loss: 1.3071

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 8
  • optimizer: Use OptimizerNames.ADEMAMIX_8BIT and the args are: beta1=0.9,beta2=0.999,beta3=0.999,alpha=5
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 69
  • training_steps: 1394

Training results

Training Loss Epoch Step Validation Loss
3.8443 0.0014 1 4.3592
3.7407 0.0502 35 4.1027
2.9405 0.1004 70 2.8520
2.0344 0.1506 105 2.1397
1.8346 0.2008 140 1.9007
1.7566 0.2510 175 1.7788
1.6076 0.3012 210 1.6968
1.6816 0.3514 245 1.6405
1.5809 0.4016 280 1.5957
1.4863 0.4518 315 1.5635
1.4724 0.5020 350 1.5370
1.3741 0.5522 385 1.5092
1.5398 0.6024 420 1.4898
1.4832 0.6526 455 1.4698
1.3365 0.7028 490 1.4528
1.3616 0.7530 525 1.4382
1.594 0.8032 560 1.4259
1.4095 0.8534 595 1.4155
1.3814 0.9035 630 1.4050
1.4576 0.9537 665 1.3952
1.361 1.0029 700 1.3872
1.3922 1.0531 735 1.3791
1.4289 1.1033 770 1.3711
1.2603 1.1535 805 1.3643
1.3714 1.2037 840 1.3579
1.2902 1.2539 875 1.3517
1.3005 1.3041 910 1.3461
1.4258 1.3542 945 1.3411
1.3047 1.4044 980 1.3367
1.2701 1.4546 1015 1.3325
1.2929 1.5048 1050 1.3291
1.3233 1.5550 1085 1.3251
1.1852 1.6052 1120 1.3221
1.3357 1.6554 1155 1.3192
1.2157 1.7056 1190 1.3166
1.324 1.7558 1225 1.3139
1.2716 1.8060 1260 1.3117
1.2703 1.8562 1295 1.3099
1.2428 1.9064 1330 1.3084
1.2295 1.9566 1365 1.3071

Framework versions

  • Transformers 4.51.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.5.1
  • Tokenizers 0.21.1
Downloads last month
0
Safetensors
Model size
329M params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for pocketdoc-interim/AFM-SFT-v0.0.1

Finetuned
(1)
this model