See axolotl config

axolotl version: 0.8.0.dev0

# 学習のベースモデルに関する設定
base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer

# 学習後のモデルのHFへのアップロードに関する設定
hub_model_id: kazuyamaa/DeepSeek-R1-Distill-Qwen-32B-axolotl-sft-v1.0
hub_strategy: "end"
push_dataset_to_hub:
hf_use_auth_token: true

# Liger Kernelの設定（学習の軽量・高速化）
plugins:
  - axolotl.integrations.liger.LigerPlugin
liger_cross_entropy: false
liger_rope: true
liger_rms_norm: true
liger_swiglu: true
liger_fused_linear_cross_entropy: true

# 量子化に関する設定
load_in_8bit: false
load_in_4bit: true

# SFTに利用するchat templateの設定
chat_template: gemma

# 学習データセットの前処理に関する設定
datasets:
  - path: kanhatakeyama/ramdom-to-fixed-multiturn-Calm3
    split: 20240806filtered[0:10000]
    type: chat_template
    field_messages: messages
    message_field_role: role
    message_field_content: content
  - path: Aratako/Magpie-Tanuki-Qwen2.5-72B-Answered
    split: train[0:10000]
    type: chat_template
    field_messages: messages
    message_field_role: role
    message_field_content: content
  - path: Aratako/magpie-qwen2.5-32b-reasoning-100k-formatted
    split: train[0:10000]
    type: chat_template
    field_messages: conversations
    message_field_role: role
    message_field_content: content
  - path: Aratako/magpie-reasoning-llama-nemotron-70b-100k-filtered
    split: train[0:10000]
    type: chat_template
    field_messages: conversations
    message_field_role: role
    message_field_content: content
  - path: Aratako/Open-Platypus-Japanese-masked-formatted
    split: train[0:10000]
    type: chat_template
    field_messages: conversations
    message_field_role: role
    message_field_content: content
  - path: kanhatakeyama/wizardlm8x22b-logical-math-coding-sft_additional-ja
    split: train[0:10000]
    type: chat_template
    field_messages: messages
    message_field_role: role
    message_field_content: content
  - path: Aratako/magpie-ultra-v0.1-formatted
    split: train[0:10000]
    type: chat_template
    field_messages: conversations
    message_field_role: role
    message_field_content: content
  - path: Aratako/orca-agentinstruct-1M-v1-selected
    split: train[0:10000]
    type: chat_template
    field_messages: messages
    message_field_role: role
    message_field_content: content
  - path: Aratako/Synthetic-JP-EN-Coding-Dataset-801k-50k
    split: train[0:10000]
    type: chat_template
    field_messages: messages
    message_field_role: role
    message_field_content: content

# データセット、モデルの出力先に関する設定
shuffle_merged_datasets: true
dataset_prepared_path: /workspace/data/sft-data
output_dir: /content/output/DeepSeek-R1-Distill-Qwen-32B-axolotl-sft-v1.0

# valid datasetのサイズ
val_set_size: 0.05

# LoRAに関する設定（フルファインチューニングしたい場合は全て空欄にする）
adapter: qlora
lora_model_dir:
lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:

# wandbに関する設定
#wandb_project: axolotl
#wandb_entity: kazukitakayamas051
#wandb_watch:
#wandb_name: sft-lora-1
#wandb_log_model:

# 学習に関する様々な設定
sequence_len: 4096
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true

gradient_accumulation_steps: 16
micro_batch_size: 1
num_epochs: 1
optimizer: paged_adamw_8bit
lr_scheduler: cosine
cosine_min_lr_ratio: 0.1
learning_rate: 3e-4

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: false
early_stopping_patience:
auto_resume_from_checkpoints: true
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

save_strategy: steps
save_steps: 50
save_total_limit: 2

warmup_steps: 10
eval_steps: 50
eval_batch_size: 1
eval_table_size:
eval_max_new_tokens:
debug:
deepspeed: /workspace/axolotl/deepspeed_configs/zero3_bf16.json
weight_decay: 0.01
fsdp:
fsdp_config:
special_tokens:
  pad_token: <pad>

DeepSeek-R1-Distill-Qwen-32B-axolotl-sft-v1.0

This model is a fine-tuned version of deepseek-ai/DeepSeek-R1-Distill-Qwen-32B on the kanhatakeyama/ramdom-to-fixed-multiturn-Calm3, the Aratako/Magpie-Tanuki-Qwen2.5-72B-Answered, the Aratako/magpie-qwen2.5-32b-reasoning-100k-formatted, the Aratako/magpie-reasoning-llama-nemotron-70b-100k-filtered, the Aratako/Open-Platypus-Japanese-masked-formatted, the kanhatakeyama/wizardlm8x22b-logical-math-coding-sft_additional-ja, the Aratako/magpie-ultra-v0.1-formatted, the Aratako/orca-agentinstruct-1M-v1-selected and the Aratako/Synthetic-JP-EN-Coding-Dataset-801k-50k datasets. It achieves the following results on the evaluation set:

Loss: 0.6154

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 1
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 16
total_train_batch_size: 16
optimizer: Use paged_adamw_8bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 10
num_epochs: 1.0

Training results

Training Loss	Epoch	Step	Validation Loss
1.0196	0.0008	1	0.9386
0.732	0.0381	50	0.7104
0.7803	0.0763	100	0.6853
0.6013	0.1144	150	0.6712
0.6767	0.1526	200	0.6628
0.701	0.1907	250	0.6565
0.6976	0.2289	300	0.6520
0.7022	0.2670	350	0.6487
0.6889	0.3051	400	0.6449
0.6673	0.3433	450	0.6411
0.6067	0.3814	500	0.6382
0.644	0.4196	550	0.6357
0.9572	0.4577	600	0.6336
0.6466	0.4959	650	0.6310
0.6781	0.5340	700	0.6291
0.6473	0.5721	750	0.6274
0.6235	0.6103	800	0.6255
0.6564	0.6484	850	0.6238
0.6009	0.6866	900	0.6221
0.5759	0.7247	950	0.6208
0.5817	0.7628	1000	0.6197
0.6438	0.8010	1050	0.6190
0.6102	0.8391	1100	0.6180
0.5997	0.8773	1150	0.6170
0.5896	0.9154	1200	0.6164
0.5713	0.9536	1250	0.6158
0.6164	0.9917	1300	0.6154

Framework versions

PEFT 0.14.0
Transformers 4.49.0
Pytorch 2.5.1+cu124
Datasets 3.2.0
Tokenizers 0.21.0

kazuyamaa
/

DeepSeek-R1-Distill-Qwen-32B-axolotl-sft-v1.0

DeepSeek-R1-Distill-Qwen-32B-axolotl-sft-v1.0

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for kazuyamaa/DeepSeek-R1-Distill-Qwen-32B-axolotl-sft-v1.0

Datasets used to train kazuyamaa/DeepSeek-R1-Distill-Qwen-32B-axolotl-sft-v1.0

Evaluation results