wandb: Currently logged in as: priyanshi-pal (priyanshipal). Use `wandb login --relogin` to force relogin wandb: wandb version 0.17.7 is available! To upgrade, please run: wandb: $ pip install wandb --upgrade wandb: Tracking run with wandb version 0.17.6 wandb: Run data is saved locally in /scratch/elec/t405-puhe/p/palp3/MUCS/wandb/run-20240822_151437-2b363w6i wandb: Run `wandb offline` to turn off syncing. wandb: Syncing run eval_pd2000_s300_shuff100_hindi wandb: ⭐️ View project at https://wandb.ai/priyanshipal/huggingface wandb: 🚀 View run at https://wandb.ai/priyanshipal/huggingface/runs/2b363w6i /scratch/work/palp3/myenv/lib/python3.11/site-packages/transformers/training_args.py:1525: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead warnings.warn( /scratch/work/palp3/myenv/lib/python3.11/site-packages/transformers/models/auto/configuration_auto.py:957: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers. Please use `token` instead. warnings.warn( /scratch/work/palp3/myenv/lib/python3.11/site-packages/transformers/models/auto/feature_extraction_auto.py:329: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers. Please use `token` instead. warnings.warn( Wav2Vec2CTCTokenizer(name_or_path='', vocab_size=149, model_max_length=1000000000000000019884624838656, is_fast=False, padding_side='right', truncation_side='right', special_tokens={'bos_token': '', 'eos_token': '', 'unk_token': '[UNK]', 'pad_token': '[PAD]'}, clean_up_tokenization_spaces=True), added_tokens_decoder={ 147: AddedToken("[UNK]", rstrip=True, lstrip=True, single_word=False, normalized=False, special=False), 148: AddedToken("[PAD]", rstrip=True, lstrip=True, single_word=False, normalized=False, special=False), 149: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 150: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), } CHECK MODEL PARAMS Wav2Vec2ForCTC( (wav2vec2): Wav2Vec2Model( (feature_extractor): Wav2Vec2FeatureEncoder( (conv_layers): ModuleList( (0): Wav2Vec2LayerNormConvLayer( (conv): Conv1d(1, 512, kernel_size=(10,), stride=(5,)) (layer_norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (activation): GELUActivation() ) (1-4): 4 x Wav2Vec2LayerNormConvLayer( (conv): Conv1d(512, 512, kernel_size=(3,), stride=(2,)) (layer_norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (activation): GELUActivation() ) (5-6): 2 x Wav2Vec2LayerNormConvLayer( (conv): Conv1d(512, 512, kernel_size=(2,), stride=(2,)) (layer_norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (activation): GELUActivation() ) ) ) (feature_projection): Wav2Vec2FeatureProjection( (layer_norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (projection): Linear(in_features=512, out_features=1024, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (encoder): Wav2Vec2EncoderStableLayerNorm( (pos_conv_embed): Wav2Vec2PositionalConvEmbedding( (conv): ParametrizedConv1d( 1024, 1024, kernel_size=(128,), stride=(1,), padding=(64,), groups=16 (parametrizations): ModuleDict( (weight): ParametrizationList( (0): _WeightNorm() ) ) ) (padding): Wav2Vec2SamePadLayer() (activation): GELUActivation() ) (layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.0, inplace=False) (layers): ModuleList( (0-23): 24 x Wav2Vec2EncoderLayerStableLayerNorm( (attention): Wav2Vec2SdpaAttention( (k_proj): Linear(in_features=1024, out_features=1024, bias=True) (v_proj): Linear(in_features=1024, out_features=1024, bias=True) (q_proj): Linear(in_features=1024, out_features=1024, bias=True) (out_proj): Linear(in_features=1024, out_features=1024, bias=True) ) (dropout): Dropout(p=0.0, inplace=False) (layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (feed_forward): Wav2Vec2FeedForward( (intermediate_dropout): Dropout(p=0.0, inplace=False) (intermediate_dense): Linear(in_features=1024, out_features=4096, bias=True) (intermediate_act_fn): GELUActivation() (output_dense): Linear(in_features=4096, out_features=1024, bias=True) (output_dropout): Dropout(p=0.0, inplace=False) ) (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) ) ) ) ) (dropout): Dropout(p=0.0, inplace=False) (lm_head): Linear(in_features=1024, out_features=151, bias=True) ) preprocess datasets: 0%| | 0/572 [00:00 main() File "/scratch/elec/puhe/p/palp3/MUCS/eval_script_indicwav2vec.py", line 637, in main print("check the eval set length", len(vectorized_datasets["eval"]["audio_id"])) ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^ File "/scratch/work/palp3/myenv/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 2866, in __getitem__ return self._getitem(key) ^^^^^^^^^^^^^^^^^^ File "/scratch/work/palp3/myenv/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 2850, in _getitem pa_subtable = query_table(self._data, key, indices=self._indices) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/scratch/work/palp3/myenv/lib/python3.11/site-packages/datasets/formatting/formatting.py", line 584, in query_table _check_valid_column_key(key, table.column_names) File "/scratch/work/palp3/myenv/lib/python3.11/site-packages/datasets/formatting/formatting.py", line 521, in _check_valid_column_key raise KeyError(f"Column {key} not in the dataset. Current columns in the dataset: {columns}") KeyError: "Column audio_id not in the dataset. Current columns in the dataset: ['input_values', 'input_length', 'labels']" wandb: - 0.011 MB of 0.011 MB uploaded wandb: \ 0.011 MB of 0.028 MB uploaded wandb: 🚀 View run eval_pd2000_s300_shuff100_hindi at: https://wandb.ai/priyanshipal/huggingface/runs/2b363w6i wandb: ⭐️ View project at: https://wandb.ai/priyanshipal/huggingface wandb: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s) wandb: Find logs at: ./wandb/run-20240822_151437-2b363w6i/logs wandb: WARNING The new W&B backend becomes opt-out in version 0.18.0; try it out with `wandb.require("core")`! See https://wandb.me/wandb-core for more information.