[2025-05-08 13:54:20] Created output directory: train_results_ar/google_gemma-3-1b-pt_full_upsample1000
[2025-05-08 13:54:20] Chat mode disabled
[2025-05-08 13:54:20] Model size is 3B or smaller (1 B). Using full fine-tuning.
[2025-05-08 13:54:20] No QA format data will be used
[2025-05-08 13:54:20] =======================================
[2025-05-08 13:54:20] Starting training for model: google/gemma-3-1b-pt
[2025-05-08 13:54:20] =======================================
[2025-05-08 13:54:20] CUDA_VISIBLE_DEVICES: 0,1,2,3,4,5,6,7
[2025-05-08 13:54:20] WANDB_PROJECT: wikidyk-ar
[2025-05-08 13:54:20] DATA_PATH: data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json
[2025-05-08 13:54:20] Global Batch Size: 256
[2025-05-08 13:54:20] Data Size: -1
[2025-05-08 13:54:20] Executing command: torchrun --nproc_per_node "8" --master-port 29503 src/train.py     --model_name_or_path "google/gemma-3-1b-pt"     --data_path "data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json"     --output_dir "train_results_ar/google_gemma-3-1b-pt_full_upsample1000"     --num_upsample "1000"     --per_device_train_batch_size "32"     --gradient_accumulation_steps "1"     --learning_rate "2e-5"     --num_train_epochs "1"     --model_max_length "4096"     --report_to wandb --logging_steps 50 --save_strategy no     --bf16 True --use_flash_attention_2 True     --qa_data_ratio "-1"     --predict_mask "false"                    
[2025-05-08 13:54:20] Training started at 2025年 05月 08日 星期四 13:54:20 CST
W0508 13:54:21.027000 3286116 site-packages/torch/distributed/run.py:792] 
W0508 13:54:21.027000 3286116 site-packages/torch/distributed/run.py:792] *****************************************
W0508 13:54:21.027000 3286116 site-packages/torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
W0508 13:54:21.027000 3286116 site-packages/torch/distributed/run.py:792] *****************************************
WARNING:root:Output directory: train_results_ar/google_gemma-3-1b-pt_full_upsample1000
WARNING:root:Output directory: train_results_ar/google_gemma-3-1b-pt_full_upsample1000
WARNING:root:Output directory: train_results_ar/google_gemma-3-1b-pt_full_upsample1000
WARNING:root:Output directory: train_results_ar/google_gemma-3-1b-pt_full_upsample1000
WARNING:root:Output directory: train_results_ar/google_gemma-3-1b-pt_full_upsample1000
WARNING:root:Output directory: train_results_ar/google_gemma-3-1b-pt_full_upsample1000
WARNING:root:Output directory: train_results_ar/google_gemma-3-1b-pt_full_upsample1000
WARNING:root:Output directory: train_results_ar/google_gemma-3-1b-pt_full_upsample1000
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
[rank0]: Traceback (most recent call last):
[rank0]:   File "/cq_1/share_1603164/user/wenhaowyu/WikiDYKEvalV2/src/train.py", line 134, in <module>
[rank0]:     train()
[rank0]:   File "/cq_1/share_1603164/user/wenhaowyu/WikiDYKEvalV2/src/train.py", line 81, in train
[rank0]:     model = load_model(
[rank0]:             ^^^^^^^^^^^
[rank0]:   File "/cq_1/share_1603164/user/wenhaowyu/WikiDYKEvalV2/src/utils/tools.py", line 119, in load_model
[rank0]:     return AutoModelForCausalLM.from_pretrained(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/root/miniconda3/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 571, in from_pretrained
[rank0]:     return model_class.from_pretrained(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/root/miniconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 279, in _wrapper
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/root/miniconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4336, in from_pretrained
[rank0]:     config = cls._autoset_attn_implementation(
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/root/miniconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2109, in _autoset_attn_implementation
[rank0]:     cls._check_and_enable_flash_attn_2(
[rank0]:   File "/root/miniconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2252, in _check_and_enable_flash_attn_2
[rank0]:     raise ImportError(f"{preface} the package flash_attn seems to be not installed. {install_message}")
[rank0]: ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
[rank5]: Traceback (most recent call last):
[rank5]:   File "/cq_1/share_1603164/user/wenhaowyu/WikiDYKEvalV2/src/train.py", line 134, in <module>
[rank5]:     train()
[rank5]:   File "/cq_1/share_1603164/user/wenhaowyu/WikiDYKEvalV2/src/train.py", line 81, in train
[rank5]:     model = load_model(
[rank5]:             ^^^^^^^^^^^
[rank5]:   File "/cq_1/share_1603164/user/wenhaowyu/WikiDYKEvalV2/src/utils/tools.py", line 119, in load_model
[rank5]:     return AutoModelForCausalLM.from_pretrained(
[rank5]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/root/miniconda3/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 571, in from_pretrained
[rank5]:     return model_class.from_pretrained(
[rank5]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/root/miniconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 279, in _wrapper
[rank5]:     return func(*args, **kwargs)
[rank5]:            ^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/root/miniconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4336, in from_pretrained
[rank5]:     config = cls._autoset_attn_implementation(
[rank5]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/root/miniconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2109, in _autoset_attn_implementation
[rank5]:     cls._check_and_enable_flash_attn_2(
[rank5]:   File "/root/miniconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2252, in _check_and_enable_flash_attn_2
[rank5]:     raise ImportError(f"{preface} the package flash_attn seems to be not installed. {install_message}")
[rank5]: ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
[rank1]: Traceback (most recent call last):
[rank1]:   File "/cq_1/share_1603164/user/wenhaowyu/WikiDYKEvalV2/src/train.py", line 134, in <module>
[rank1]:     train()
[rank1]:   File "/cq_1/share_1603164/user/wenhaowyu/WikiDYKEvalV2/src/train.py", line 81, in train
[rank1]:     model = load_model(
[rank1]:             ^^^^^^^^^^^
[rank1]:   File "/cq_1/share_1603164/user/wenhaowyu/WikiDYKEvalV2/src/utils/tools.py", line 119, in load_model
[rank1]:     return AutoModelForCausalLM.from_pretrained(
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/root/miniconda3/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 571, in from_pretrained
[rank1]:     return model_class.from_pretrained(
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/root/miniconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 279, in _wrapper
[rank1]:     return func(*args, **kwargs)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/root/miniconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4336, in from_pretrained
[rank1]:     config = cls._autoset_attn_implementation(
[rank1]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/root/miniconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2109, in _autoset_attn_implementation
[rank1]:     cls._check_and_enable_flash_attn_2(
[rank1]:   File "/root/miniconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2252, in _check_and_enable_flash_attn_2
[rank1]:     raise ImportError(f"{preface} the package flash_attn seems to be not installed. {install_message}")
[rank1]: ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.
[rank7]: Traceback (most recent call last):
[rank7]:   File "/cq_1/share_1603164/user/wenhaowyu/WikiDYKEvalV2/src/train.py", line 134, in <module>
[rank7]:     train()
[rank7]:   File "/cq_1/share_1603164/user/wenhaowyu/WikiDYKEvalV2/src/train.py", line 81, in train
[rank7]:     model = load_model(
[rank7]:             ^^^^^^^^^^^
[rank7]:   File "/cq_1/share_1603164/user/wenhaowyu/WikiDYKEvalV2/src/utils/tools.py", line 119, in load_model
[rank7]:     return AutoModelForCausalLM.from_pretrained(
[rank7]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/root/miniconda3/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 571, in from_pretrained
[rank7]:     return model_class.from_pretrained(
[rank7]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/root/miniconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 279, in _wrapper
[rank7]:     return func(*args, **kwargs)
[rank7]:            ^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/root/miniconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4336, in from_pretrained
[rank7]:     config = cls._autoset_attn_implementation(
[rank7]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/root/miniconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2109, in _autoset_attn_implementation
[rank7]:     cls._check_and_enable_flash_attn_2(
[rank7]:   File "/root/miniconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2252, in _check_and_enable_flash_attn_2
[rank7]:     raise ImportError(f"{preface} the package flash_attn seems to be not installed. {install_message}")
[rank7]: ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
[rank2]: Traceback (most recent call last):
[rank2]:   File "/cq_1/share_1603164/user/wenhaowyu/WikiDYKEvalV2/src/train.py", line 134, in <module>
[rank2]:     train()
[rank2]:   File "/cq_1/share_1603164/user/wenhaowyu/WikiDYKEvalV2/src/train.py", line 81, in train
[rank2]:     model = load_model(
[rank2]:             ^^^^^^^^^^^
[rank2]:   File "/cq_1/share_1603164/user/wenhaowyu/WikiDYKEvalV2/src/utils/tools.py", line 119, in load_model
[rank2]:     return AutoModelForCausalLM.from_pretrained(
[rank2]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/root/miniconda3/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 571, in from_pretrained
[rank2]:     return model_class.from_pretrained(
[rank2]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/root/miniconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 279, in _wrapper
[rank2]:     return func(*args, **kwargs)
[rank2]:            ^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/root/miniconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4336, in from_pretrained
[rank2]:     config = cls._autoset_attn_implementation(
[rank2]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/root/miniconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2109, in _autoset_attn_implementation
[rank2]:     cls._check_and_enable_flash_attn_2(
[rank2]:   File "/root/miniconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2252, in _check_and_enable_flash_attn_2
[rank2]:     raise ImportError(f"{preface} the package flash_attn seems to be not installed. {install_message}")
[rank2]: ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
[rank6]: Traceback (most recent call last):
[rank6]:   File "/cq_1/share_1603164/user/wenhaowyu/WikiDYKEvalV2/src/train.py", line 134, in <module>
[rank6]:     train()
[rank6]:   File "/cq_1/share_1603164/user/wenhaowyu/WikiDYKEvalV2/src/train.py", line 81, in train
[rank6]:     model = load_model(
[rank6]:             ^^^^^^^^^^^
[rank6]:   File "/cq_1/share_1603164/user/wenhaowyu/WikiDYKEvalV2/src/utils/tools.py", line 119, in load_model
[rank6]:     return AutoModelForCausalLM.from_pretrained(
[rank6]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank6]:   File "/root/miniconda3/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 571, in from_pretrained
[rank6]:     return model_class.from_pretrained(
[rank6]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank6]:   File "/root/miniconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 279, in _wrapper
[rank6]:     return func(*args, **kwargs)
[rank6]:            ^^^^^^^^^^^^^^^^^^^^^
[rank6]:   File "/root/miniconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4336, in from_pretrained
[rank6]:     config = cls._autoset_attn_implementation(
[rank6]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank6]:   File "/root/miniconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2109, in _autoset_attn_implementation
[rank6]:     cls._check_and_enable_flash_attn_2(
[rank6]:   File "/root/miniconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2252, in _check_and_enable_flash_attn_2
[rank6]:     raise ImportError(f"{preface} the package flash_attn seems to be not installed. {install_message}")
[rank6]: ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
[rank4]: Traceback (most recent call last):
[rank4]:   File "/cq_1/share_1603164/user/wenhaowyu/WikiDYKEvalV2/src/train.py", line 134, in <module>
[rank4]:     train()
[rank4]:   File "/cq_1/share_1603164/user/wenhaowyu/WikiDYKEvalV2/src/train.py", line 81, in train
[rank4]:     model = load_model(
[rank4]:             ^^^^^^^^^^^
[rank4]:   File "/cq_1/share_1603164/user/wenhaowyu/WikiDYKEvalV2/src/utils/tools.py", line 119, in load_model
[rank4]:     return AutoModelForCausalLM.from_pretrained(
[rank4]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/root/miniconda3/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 571, in from_pretrained
[rank4]:     return model_class.from_pretrained(
[rank4]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/root/miniconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 279, in _wrapper
[rank4]:     return func(*args, **kwargs)
[rank4]:            ^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/root/miniconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4336, in from_pretrained
[rank4]:     config = cls._autoset_attn_implementation(
[rank4]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/root/miniconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2109, in _autoset_attn_implementation
[rank4]:     cls._check_and_enable_flash_attn_2(
[rank4]:   File "/root/miniconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2252, in _check_and_enable_flash_attn_2
[rank4]:     raise ImportError(f"{preface} the package flash_attn seems to be not installed. {install_message}")
[rank4]: ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
[rank3]: Traceback (most recent call last):
[rank3]:   File "/cq_1/share_1603164/user/wenhaowyu/WikiDYKEvalV2/src/train.py", line 134, in <module>
[rank3]:     train()
[rank3]:   File "/cq_1/share_1603164/user/wenhaowyu/WikiDYKEvalV2/src/train.py", line 81, in train
[rank3]:     model = load_model(
[rank3]:             ^^^^^^^^^^^
[rank3]:   File "/cq_1/share_1603164/user/wenhaowyu/WikiDYKEvalV2/src/utils/tools.py", line 119, in load_model
[rank3]:     return AutoModelForCausalLM.from_pretrained(
[rank3]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/root/miniconda3/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 571, in from_pretrained
[rank3]:     return model_class.from_pretrained(
[rank3]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/root/miniconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 279, in _wrapper
[rank3]:     return func(*args, **kwargs)
[rank3]:            ^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/root/miniconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4336, in from_pretrained
[rank3]:     config = cls._autoset_attn_implementation(
[rank3]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/root/miniconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2109, in _autoset_attn_implementation
[rank3]:     cls._check_and_enable_flash_attn_2(
[rank3]:   File "/root/miniconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2252, in _check_and_enable_flash_attn_2
[rank3]:     raise ImportError(f"{preface} the package flash_attn seems to be not installed. {install_message}")
[rank3]: ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.
[rank0]:[W508 14:02:58.638586573 ProcessGroupNCCL.cpp:1496] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
W0508 14:02:58.575000 3286116 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 3286182 closing signal SIGTERM
W0508 14:02:58.575000 3286116 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 3286184 closing signal SIGTERM
W0508 14:02:58.576000 3286116 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 3286185 closing signal SIGTERM
W0508 14:02:58.577000 3286116 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 3286186 closing signal SIGTERM
W0508 14:02:58.577000 3286116 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 3286187 closing signal SIGTERM
W0508 14:02:58.578000 3286116 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 3286188 closing signal SIGTERM
W0508 14:02:58.578000 3286116 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 3286189 closing signal SIGTERM
E0508 14:02:59.370000 3286116 site-packages/torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 1 (pid: 3286183) of binary: /root/miniconda3/bin/python
Traceback (most recent call last):
  File "/root/miniconda3/bin/torchrun", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/torch/distributed/run.py", line 918, in main
    run(args)
  File "/root/miniconda3/lib/python3.11/site-packages/torch/distributed/run.py", line 909, in run
    elastic_launch(
  File "/root/miniconda3/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 138, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
src/train.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2025-05-08_14:02:58
  host      : TENCENT64.site
  rank      : 1 (local_rank: 1)
  exitcode  : 1 (pid: 3286183)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
[2025-05-08 14:02:59] ERROR: Training failed for google/gemma-3-1b-pt with exit code 1
[2025-05-08 14:02:59] ERROR: Training failed for google/gemma-3-1b-pt with exit code 1
[2025-05-08 14:02:59] Check error log for details: train_results_ar/google_gemma-3-1b-pt_full_upsample1000/20250508_134354.log
[2025-05-08 14:02:59] Resource usage after training google/gemma-3-1b-pt:
[2025-05-08 14:02:59] GPU memory usage:
0 MiB, 97871 MiB
0 MiB, 97871 MiB
0 MiB, 97871 MiB
0 MiB, 97871 MiB
0 MiB, 97871 MiB
0 MiB, 97871 MiB
0 MiB, 97871 MiB
0 MiB, 97871 MiB
[2025-05-08 14:02:59] Disk space usage for model outputs:
24K	train_results_ar/google_gemma-3-1b-pt_full_upsample1000
[2025-05-08 14:02:59]