s300_shuff100 / evalonlyhindi_indicwav2vec_MUCS_warmup500_s300shuff100_2142409.out
Priyanship's picture
End of training
c75a43d verified
wandb: Currently logged in as: priyanshi-pal (priyanshipal). Use `wandb login --relogin` to force relogin
wandb: wandb version 0.17.7 is available! To upgrade, please run:
wandb: $ pip install wandb --upgrade
wandb: Tracking run with wandb version 0.17.6
wandb: Run data is saved locally in /scratch/elec/t405-puhe/p/palp3/MUCS/wandb/run-20240822_151437-2b363w6i
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run eval_pd2000_s300_shuff100_hindi
wandb: ⭐️ View project at https://wandb.ai/priyanshipal/huggingface
wandb: πŸš€ View run at https://wandb.ai/priyanshipal/huggingface/runs/2b363w6i
/scratch/work/palp3/myenv/lib/python3.11/site-packages/transformers/training_args.py:1525: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of πŸ€— Transformers. Use `eval_strategy` instead
warnings.warn(
/scratch/work/palp3/myenv/lib/python3.11/site-packages/transformers/models/auto/configuration_auto.py:957: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers. Please use `token` instead.
warnings.warn(
/scratch/work/palp3/myenv/lib/python3.11/site-packages/transformers/models/auto/feature_extraction_auto.py:329: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers. Please use `token` instead.
warnings.warn(
Wav2Vec2CTCTokenizer(name_or_path='', vocab_size=149, model_max_length=1000000000000000019884624838656, is_fast=False, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '[UNK]', 'pad_token': '[PAD]'}, clean_up_tokenization_spaces=True), added_tokens_decoder={
147: AddedToken("[UNK]", rstrip=True, lstrip=True, single_word=False, normalized=False, special=False),
148: AddedToken("[PAD]", rstrip=True, lstrip=True, single_word=False, normalized=False, special=False),
149: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
150: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}
CHECK MODEL PARAMS Wav2Vec2ForCTC(
(wav2vec2): Wav2Vec2Model(
(feature_extractor): Wav2Vec2FeatureEncoder(
(conv_layers): ModuleList(
(0): Wav2Vec2LayerNormConvLayer(
(conv): Conv1d(1, 512, kernel_size=(10,), stride=(5,))
(layer_norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
(activation): GELUActivation()
)
(1-4): 4 x Wav2Vec2LayerNormConvLayer(
(conv): Conv1d(512, 512, kernel_size=(3,), stride=(2,))
(layer_norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
(activation): GELUActivation()
)
(5-6): 2 x Wav2Vec2LayerNormConvLayer(
(conv): Conv1d(512, 512, kernel_size=(2,), stride=(2,))
(layer_norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
(activation): GELUActivation()
)
)
)
(feature_projection): Wav2Vec2FeatureProjection(
(layer_norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
(projection): Linear(in_features=512, out_features=1024, bias=True)
(dropout): Dropout(p=0.0, inplace=False)
)
(encoder): Wav2Vec2EncoderStableLayerNorm(
(pos_conv_embed): Wav2Vec2PositionalConvEmbedding(
(conv): ParametrizedConv1d(
1024, 1024, kernel_size=(128,), stride=(1,), padding=(64,), groups=16
(parametrizations): ModuleDict(
(weight): ParametrizationList(
(0): _WeightNorm()
)
)
)
(padding): Wav2Vec2SamePadLayer()
(activation): GELUActivation()
)
(layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.0, inplace=False)
(layers): ModuleList(
(0-23): 24 x Wav2Vec2EncoderLayerStableLayerNorm(
(attention): Wav2Vec2SdpaAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(dropout): Dropout(p=0.0, inplace=False)
(layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(feed_forward): Wav2Vec2FeedForward(
(intermediate_dropout): Dropout(p=0.0, inplace=False)
(intermediate_dense): Linear(in_features=1024, out_features=4096, bias=True)
(intermediate_act_fn): GELUActivation()
(output_dense): Linear(in_features=4096, out_features=1024, bias=True)
(output_dropout): Dropout(p=0.0, inplace=False)
)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
)
)
)
(dropout): Dropout(p=0.0, inplace=False)
(lm_head): Linear(in_features=1024, out_features=151, bias=True)
)
preprocess datasets: 0%| | 0/572 [00:00<?, ? examples/s] preprocess datasets: 0%| | 1/572 [00:06<1:03:00, 6.62s/ examples] preprocess datasets: 2%|▏ | 10/572 [00:06<04:35, 2.04 examples/s] preprocess datasets: 3%|β–Ž | 18/572 [00:06<02:08, 4.30 examples/s] preprocess datasets: 4%|▍ | 25/572 [00:06<01:19, 6.88 examples/s] preprocess datasets: 6%|β–Œ | 32/572 [00:07<00:53, 10.17 examples/s] preprocess datasets: 7%|β–‹ | 40/572 [00:07<00:35, 15.06 examples/s] preprocess datasets: 9%|β–Š | 49/572 [00:07<00:24, 21.57 examples/s] preprocess datasets: 10%|β–ˆ | 60/572 [00:07<00:16, 30.84 examples/s] preprocess datasets: 12%|β–ˆβ– | 68/572 [00:07<00:15, 33.57 examples/s] preprocess datasets: 13%|β–ˆβ–Ž | 77/572 [00:07<00:12, 38.25 examples/s] preprocess datasets: 15%|β–ˆβ– | 85/572 [00:07<00:11, 41.33 examples/s] preprocess datasets: 17%|β–ˆβ–‹ | 95/572 [00:08<00:10, 46.52 examples/s] preprocess datasets: 18%|β–ˆβ–Š | 103/572 [00:08<00:09, 51.61 examples/s] preprocess datasets: 20%|β–ˆβ–‰ | 112/572 [00:08<00:07, 57.65 examples/s] preprocess datasets: 22%|β–ˆβ–ˆβ– | 123/572 [00:08<00:06, 66.80 examples/s] preprocess datasets: 24%|β–ˆβ–ˆβ– | 136/572 [00:08<00:05, 79.78 examples/s] preprocess datasets: 27%|β–ˆβ–ˆβ–‹ | 152/572 [00:08<00:04, 97.38 examples/s] preprocess datasets: 28%|β–ˆβ–ˆβ–Š | 163/572 [00:08<00:04, 97.98 examples/s] preprocess datasets: 31%|β–ˆβ–ˆβ–ˆ | 175/572 [00:08<00:03, 100.03 examples/s] preprocess datasets: 33%|β–ˆβ–ˆβ–ˆβ–Ž | 186/572 [00:09<00:03, 97.70 examples/s] preprocess datasets: 35%|β–ˆβ–ˆβ–ˆβ–Œ | 203/572 [00:09<00:03, 94.71 examples/s] preprocess datasets: 38%|β–ˆβ–ˆβ–ˆβ–Š | 215/572 [00:09<00:03, 93.69 examples/s] preprocess datasets: 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 231/572 [00:09<00:03, 106.11 examples/s] preprocess datasets: 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 252/572 [00:09<00:02, 114.38 examples/s] preprocess datasets: 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 269/572 [00:09<00:02, 107.29 examples/s] preprocess datasets: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 285/572 [00:09<00:02, 108.60 examples/s] preprocess datasets: 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 299/572 [00:10<00:02, 98.64 examples/s] preprocess datasets: 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 313/572 [00:10<00:02, 89.67 examples/s] preprocess datasets: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 324/572 [00:10<00:02, 92.52 examples/s] preprocess datasets: 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 335/572 [00:10<00:02, 94.83 examples/s] preprocess datasets: 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 347/572 [00:10<00:02, 100.27 examples/s] preprocess datasets: 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 365/572 [00:10<00:01, 103.53 examples/s] preprocess datasets: 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 377/572 [00:10<00:01, 106.34 examples/s] preprocess datasets: 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 390/572 [00:10<00:01, 108.09 examples/s] preprocess datasets: 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 403/572 [00:11<00:01, 109.44 examples/s] preprocess datasets: 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 421/572 [00:11<00:01, 107.79 examples/s] preprocess datasets: 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 435/572 [00:11<00:01, 103.09 examples/s] preprocess datasets: 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 451/572 [00:11<00:01, 116.05 examples/s] preprocess datasets: 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 464/572 [00:11<00:01, 102.90 examples/s] preprocess datasets: 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 475/572 [00:11<00:01, 80.32 examples/s] preprocess datasets: 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 485/572 [00:12<00:01, 73.58 examples/s] preprocess datasets: 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 498/572 [00:12<00:00, 84.46 examples/s] preprocess datasets: 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 513/572 [00:12<00:00, 97.50 examples/s] preprocess datasets: 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 524/572 [00:12<00:00, 88.18 examples/s] preprocess datasets: 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 535/572 [00:12<00:00, 88.16 examples/s] preprocess datasets: 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 550/572 [00:12<00:00, 96.97 examples/s] preprocess datasets: 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 568/572 [00:12<00:00, 96.23 examples/s] preprocess datasets: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 572/572 [00:13<00:00, 42.21 examples/s]
Traceback (most recent call last):
File "/scratch/elec/puhe/p/palp3/MUCS/eval_script_indicwav2vec.py", line 790, in <module>
main()
File "/scratch/elec/puhe/p/palp3/MUCS/eval_script_indicwav2vec.py", line 637, in main
print("check the eval set length", len(vectorized_datasets["eval"]["audio_id"]))
~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
File "/scratch/work/palp3/myenv/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 2866, in __getitem__
return self._getitem(key)
^^^^^^^^^^^^^^^^^^
File "/scratch/work/palp3/myenv/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 2850, in _getitem
pa_subtable = query_table(self._data, key, indices=self._indices)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/scratch/work/palp3/myenv/lib/python3.11/site-packages/datasets/formatting/formatting.py", line 584, in query_table
_check_valid_column_key(key, table.column_names)
File "/scratch/work/palp3/myenv/lib/python3.11/site-packages/datasets/formatting/formatting.py", line 521, in _check_valid_column_key
raise KeyError(f"Column {key} not in the dataset. Current columns in the dataset: {columns}")
KeyError: "Column audio_id not in the dataset. Current columns in the dataset: ['input_values', 'input_length', 'labels']"
wandb: - 0.011 MB of 0.011 MB uploaded wandb: \ 0.011 MB of 0.028 MB uploaded wandb: πŸš€ View run eval_pd2000_s300_shuff100_hindi at: https://wandb.ai/priyanshipal/huggingface/runs/2b363w6i
wandb: ⭐️ View project at: https://wandb.ai/priyanshipal/huggingface
wandb: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20240822_151437-2b363w6i/logs
wandb: WARNING The new W&B backend becomes opt-out in version 0.18.0; try it out with `wandb.require("core")`! See https://wandb.me/wandb-core for more information.