See axolotl config

axolotl version: 0.10.0.dev0

base_model: THUDM/GLM-4-32B-Base-0414
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer

trust_remote_code:

# wandb configuration
wandb_project: 32b-glm4-dans-personality-engine
wandb_watch:

wandb_run_id: V1.3.0-1-4 # V{Version}-{Run Number}-{Attempt Number}
wandb_log_model:

# push checkpoints to hub
hub_model_id: Dans-DiscountModels/32b-glm4-dans-personality-engine-v1.3.0-TestArticle-1
# how to push checkpoints to hub
# https://huggingface.co/docs/transformers/v4.31.0/en/main_classes/trainer#transformers.TrainingArguments.hub_strategy
hub_strategy: "every_save"
# Whether to use hf `use_auth_token` for loading datasets. Useful for fetching private datasets
# Required to be true when used in combination with `push_dataset_to_hub`
hf_use_auth_token: true

# where to save the finished model to
output_dir: ./32b-glm4-dans-personality-engine

save_safetensors: true

datasets:
  - path: Dans-DiscountModels/pretokenization-test-4
    ds_type: parquet
    type:

plugins:
  - axolotl.integrations.liger.LigerPlugin
  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
liger_rope: false
liger_rms_norm: true
liger_glu_activation: true
liger_fused_linear_cross_entropy: false
cut_cross_entropy: true

load_in_8bit: false
load_in_4bit: false
strict: false

dataset_prepared_path: ./32b-glm4-dans-personality-engine-data
val_set_size: 0.003

sequence_len: 32768

sample_packing: true
eval_sample_packing: true

pad_to_sequence_len: true

gradient_checkpointing: unsloth

gradient_accumulation_steps: 4
micro_batch_size: 1

num_epochs: 2

optimizer: ademamix_8bit
optim_args: "beta1=0.9,beta2=0.999,beta3=0.999,alpha=5"

lr_scheduler: rex
learning_rate: 0.000008
cosine_min_lr_ratio:

weight_decay: 0

max_grad_norm: 0.001

train_on_inputs: false
group_by_length: false

bf16: true
fp16: false
tf32: false

early_stopping_patience:

resume_from_checkpoint:
auto_resume_from_checkpoints: false

local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_ratio: 0.1

evals_per_epoch: 24
eval_table_size:
eval_max_new_tokens:

saves_per_epoch: 8
save_total_limit: 1

debug: false

deepspeed: /alloc/pocketdoc/axolotl/deepspeed_configs/zero3_bf16.json

fsdp:
fsdp_config:

special_tokens:

32b-glm4-dans-personality-engine-v1.3.0-TestArticle-1

This model is a fine-tuned version of THUDM/GLM-4-32B-Base-0414 on the Dans-DiscountModels/pretokenization-test-4 dataset. It achieves the following results on the evaluation set:

Loss: 1.6235

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 1
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 4
total_train_batch_size: 32
total_eval_batch_size: 8
optimizer: Use ademamix_8bit and the args are: beta1=0.9,beta2=0.999,beta3=0.999,alpha=5
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 332
num_epochs: 2.0

Training results

Training Loss	Epoch	Step	Validation Loss
1.6456	0.0006	1	1.7604
1.6538	0.0421	70	1.7472
1.668	0.0842	140	1.7132
1.5877	0.1264	210	1.6934
1.7524	0.1685	280	1.6815
1.6687	0.2106	350	1.6738
1.7986	0.2527	420	1.6691
1.8379	0.2948	490	1.6659
1.6813	0.3369	560	1.6633
1.6749	0.3791	630	1.6607
1.5746	0.4212	700	1.6585
1.7503	0.4633	770	1.6565
1.6143	0.5054	840	1.6545
1.6	0.5475	910	1.6527
1.7525	0.5897	980	1.6510
1.5861	0.6318	1050	1.6493
1.7439	0.6739	1120	1.6477
1.6129	0.7160	1190	1.6464
1.4729	0.7581	1260	1.6454
1.6923	0.8002	1330	1.6451
1.6498	0.8424	1400	1.6441
1.5815	0.8845	1470	1.6429
1.6209	0.9266	1540	1.6418
1.6685	0.9687	1610	1.6408
1.7472	1.0108	1680	1.6397
1.5719	1.0529	1750	1.6386
1.7247	1.0951	1820	1.6377
1.7098	1.1372	1890	1.6367
1.6367	1.1793	1960	1.6358
1.7014	1.2214	2030	1.6349
1.6622	1.2635	2100	1.6340
1.5958	1.3057	2170	1.6331
1.59	1.3478	2240	1.6322
1.6959	1.3899	2310	1.6314
1.6595	1.4320	2380	1.6308
1.6163	1.4741	2450	1.6300
1.6593	1.5162	2520	1.6292
1.7528	1.5584	2590	1.6285
1.6423	1.6005	2660	1.6279
1.5997	1.6426	2730	1.6272
1.6696	1.6847	2800	1.6266
1.7232	1.7268	2870	1.6260
1.5094	1.7690	2940	1.6254
1.853	1.8111	3010	1.6249
1.756	1.8532	3080	1.6245
1.705	1.8953	3150	1.6240
1.6894	1.9374	3220	1.6237
1.5937	1.9795	3290	1.6235

Framework versions

Transformers 4.51.3
Pytorch 2.4.1+cu121
Datasets 3.5.0
Tokenizers 0.21.1

Dans-DiscountModels
/

32b-glm4-dans-personality-engine-v1.3.0-TestArticle-1

You need to agree to share your contact information to access this model

32b-glm4-dans-personality-engine-v1.3.0-TestArticle-1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for Dans-DiscountModels/32b-glm4-dans-personality-engine-v1.3.0-TestArticle-1

Evaluation results