[2025-05-08 21:44:10] Created output directory: train_results_ar/Qwen_Qwen2.5-1.5B_full_upsample1000 [2025-05-08 21:44:10] Chat mode disabled [2025-05-08 21:44:10] Model size is 3B or smaller (1.5 B). Using full fine-tuning. [2025-05-08 21:44:10] No QA format data will be used [2025-05-08 21:44:10] ======================================= [2025-05-08 21:44:10] Starting training for model: Qwen/Qwen2.5-1.5B [2025-05-08 21:44:10] ======================================= [2025-05-08 21:44:10] CUDA_VISIBLE_DEVICES: 0,1,2,3,4,5,6,7 [2025-05-08 21:44:10] WANDB_PROJECT: wikidyk-ar [2025-05-08 21:44:10] DATA_PATH: data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json [2025-05-08 21:44:10] Global Batch Size: 256 [2025-05-08 21:44:10] Data Size: -1 [2025-05-08 21:44:10] Executing command: torchrun --nproc_per_node "8" --master-port 29503 src/train.py --model_name_or_path "Qwen/Qwen2.5-1.5B" --data_path "data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json" --output_dir "train_results_ar/Qwen_Qwen2.5-1.5B_full_upsample1000" --num_upsample "1000" --per_device_train_batch_size "32" --gradient_accumulation_steps "1" --learning_rate "2e-4" --num_train_epochs "1" --model_max_length "4096" --report_to wandb --logging_steps 50 --save_strategy no --bf16 True --use_flash_attention_2 True --qa_data_ratio "-1" --predict_mask "false" [2025-05-08 21:44:10] Training started at 2025年 05月 08日 星期四 21:44:10 CST W0508 21:44:11.246000 3289183 site-packages/torch/distributed/run.py:792] W0508 21:44:11.246000 3289183 site-packages/torch/distributed/run.py:792] ***************************************** W0508 21:44:11.246000 3289183 site-packages/torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. W0508 21:44:11.246000 3289183 site-packages/torch/distributed/run.py:792] ***************************************** WARNING:root:Output directory: train_results_ar/Qwen_Qwen2.5-1.5B_full_upsample1000 WARNING:root:Output directory: train_results_ar/Qwen_Qwen2.5-1.5B_full_upsample1000 WARNING:root:Output directory: train_results_ar/Qwen_Qwen2.5-1.5B_full_upsample1000 WARNING:root:Output directory: train_results_ar/Qwen_Qwen2.5-1.5B_full_upsample1000 WARNING:root:Output directory: train_results_ar/Qwen_Qwen2.5-1.5B_full_upsample1000 WARNING:root:Output directory: train_results_ar/Qwen_Qwen2.5-1.5B_full_upsample1000 WARNING:root:Output directory: train_results_ar/Qwen_Qwen2.5-1.5B_full_upsample1000 WARNING:root:Output directory: train_results_ar/Qwen_Qwen2.5-1.5B_full_upsample1000 The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. WARNING:root:Loading data... WARNING:root:Loading data... WARNING:root:Loading data... WARNING:root:Loading data... WARNING:root:Loading data... WARNING:root:Loading data... WARNING:root:Loading data... WARNING:root:Loading data... WARNING:root:Dataset initialized with all QA data: WARNING:root: - 0 QA examples WARNING:root: - 12290 fact examples with upsampling factor 1000 WARNING:root: - Total examples: 12290000 WARNING:root:Dataset initialized with all QA data: WARNING:root: - 0 QA examples WARNING:root: - 12290 fact examples with upsampling factor 1000 WARNING:root: - Total examples: 12290000 WARNING:root:Dataset initialized with all QA data: WARNING:root: - 0 QA examples WARNING:root: - 12290 fact examples with upsampling factor 1000 WARNING:root: - Total examples: 12290000 WARNING:root:Dataset initialized with all QA data: WARNING:root: - 0 QA examples WARNING:root: - 12290 fact examples with upsampling factor 1000 WARNING:root: - Total examples: 12290000 WARNING:root:Dataset initialized with all QA data: WARNING:root: - 0 QA examples WARNING:root: - 12290 fact examples with upsampling factor 1000 WARNING:root: - Total examples: 12290000 WARNING:root:Dataset initialized with all QA data: WARNING:root: - 0 QA examples WARNING:root: - 12290 fact examples with upsampling factor 1000 WARNING:root: - Total examples: 12290000 WARNING:root:Dataset initialized with all QA data: WARNING:root: - 0 QA examples WARNING:root: - 12290 fact examples with upsampling factor 1000 WARNING:root: - Total examples: 12290000 WARNING:root:Dataset initialized with all QA data: WARNING:root: - 0 QA examples WARNING:root: - 12290 fact examples with upsampling factor 1000 WARNING:root: - Total examples: 12290000 /cq_1/share_1603164/user/wenhaowyu/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead. trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module) /cq_1/share_1603164/user/wenhaowyu/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead. trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module) /cq_1/share_1603164/user/wenhaowyu/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead. trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module) /cq_1/share_1603164/user/wenhaowyu/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead. trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module) /cq_1/share_1603164/user/wenhaowyu/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead. trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module) /cq_1/share_1603164/user/wenhaowyu/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead. trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module) /cq_1/share_1603164/user/wenhaowyu/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead. trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module) WARNING:accelerate.utils.other:Detected kernel version 5.4.241, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. /cq_1/share_1603164/user/wenhaowyu/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead. trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module) wandb: WARNING The `run_name` is currently set to the same value as `TrainingArguments.output_dir`. If this was not intended, please specify a different run name by setting the `TrainingArguments.run_name` parameter. wandb: Currently logged in as: wenhaoyu97 to https://api.wandb.ai. Use `wandb login --relogin` to force relogin wandb: Tracking run with wandb version 0.19.10 wandb: Run data is saved locally in /cq_1/share_1603164/user/wenhaowyu/WikiDYKEvalV2/wandb/run-20250508_214454-vimu775f wandb: Run `wandb offline` to turn off syncing. wandb: Syncing run train_results_ar/Qwen_Qwen2.5-1.5B_full_upsample1000 wandb: ⭐️ View project at https://wandb.ai/wenhaoyu97/wikidyk-ar wandb: 🚀 View run at https://wandb.ai/wenhaoyu97/wikidyk-ar/runs/vimu775f 0%| | 0/48008 [00:00