|
[2025-05-08 13:42:23] Created output directory: train_results_ar/meta-llama_Llama-2-7b-hf_full_upsample1000 |
|
[2025-05-08 13:42:23] Chat mode disabled |
|
[2025-05-08 13:42:23] Set MODEL_MAX_LENGTH to 4096 for Llama-2 model |
|
[2025-05-08 13:42:23] Model size is 3B or smaller (7 B). Using full fine-tuning. |
|
[2025-05-08 13:42:23] No QA format data will be used |
|
[2025-05-08 13:42:23] ======================================= |
|
[2025-05-08 13:42:23] Starting training for model: meta-llama/Llama-2-7b-hf |
|
[2025-05-08 13:42:23] ======================================= |
|
[2025-05-08 13:42:23] CUDA_VISIBLE_DEVICES: 0,1,2,3,4,5,6,7 |
|
[2025-05-08 13:42:23] WANDB_PROJECT: wikidyk-ar |
|
[2025-05-08 13:42:23] DATA_PATH: data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json |
|
[2025-05-08 13:42:23] Global Batch Size: 256 |
|
[2025-05-08 13:42:23] Data Size: -1 |
|
[2025-05-08 13:42:23] Executing command: torchrun --nproc_per_node "8" --master-port 29503 src/train.py --model_name_or_path "meta-llama/Llama-2-7b-hf" --data_path "data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json" --output_dir "train_results_ar/meta-llama_Llama-2-7b-hf_full_upsample1000" --num_upsample "1000" --per_device_train_batch_size "32" --gradient_accumulation_steps "1" --learning_rate "2e-5" --num_train_epochs "1" --model_max_length "4096" --report_to wandb --logging_steps 50 --save_strategy no --bf16 True --use_flash_attention_2 True --qa_data_ratio "-1" --predict_mask "false" |
|
[2025-05-08 13:42:23] Training started at 2025年 05月 08日 星期四 13:42:23 CST |
|
W0508 13:42:24.401000 3283386 site-packages/torch/distributed/run.py:792] |
|
W0508 13:42:24.401000 3283386 site-packages/torch/distributed/run.py:792] ***************************************** |
|
W0508 13:42:24.401000 3283386 site-packages/torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. |
|
W0508 13:42:24.401000 3283386 site-packages/torch/distributed/run.py:792] ***************************************** |
|
WARNING:root:Output directory: train_results_ar/meta-llama_Llama-2-7b-hf_full_upsample1000 |
|
WARNING:root:Output directory: train_results_ar/meta-llama_Llama-2-7b-hf_full_upsample1000 |
|
WARNING:root:Output directory: train_results_ar/meta-llama_Llama-2-7b-hf_full_upsample1000 |
|
WARNING:root:Output directory: train_results_ar/meta-llama_Llama-2-7b-hf_full_upsample1000 |
|
WARNING:root:Output directory: train_results_ar/meta-llama_Llama-2-7b-hf_full_upsample1000 |
|
WARNING:root:Output directory: train_results_ar/meta-llama_Llama-2-7b-hf_full_upsample1000 |
|
WARNING:root:Output directory: train_results_ar/meta-llama_Llama-2-7b-hf_full_upsample1000 |
|
WARNING:root:Output directory: train_results_ar/meta-llama_Llama-2-7b-hf_full_upsample1000 |
|
Fetching 2 files: 0%| | 0/2 [00:00<?, ?it/s]
Fetching 2 files: 0%| | 0/2 [00:00<?, ?it/s]
Fetching 2 files: 0%| | 0/2 [00:00<?, ?it/s]
Fetching 2 files: 0%| | 0/2 [00:00<?, ?it/s]
Fetching 2 files: 0%| | 0/2 [00:00<?, ?it/s]
Fetching 2 files: 0%| | 0/2 [00:00<?, ?it/s]
Fetching 2 files: 0%| | 0/2 [00:00<?, ?it/s]
Fetching 2 files: 0%| | 0/2 [00:00<?, ?it/s] |