Mistral_MidiTok_Transformer_Single_Instrument_Small

This model is trained from scratch using tokenized midi music. I have trained a MidiTok tokeniser (REMI) and its made by spliting multi-track midi into a single track.

We then trained in on a small dataset. Its using the Mistral model that has been cut down quite a bit.

What else needs to be done

Update model training to use small positional embeddings for the model 1024 + a padding amount like 8

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 30
eval_batch_size: 30
seed: 444
gradient_accumulation_steps: 3
total_train_batch_size: 90
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine_with_restarts
lr_scheduler_warmup_ratio: 0.3
training_steps: 20000

Framework versions

Transformers 4.46.2
Pytorch 2.1.0+cu121
Datasets 3.1.0
Tokenizers 0.20.3

adricl
/

midi_single_instrument_mistral_transformer

Mistral_MidiTok_Transformer_Single_Instrument_Small

What else needs to be done

Training hyperparameters

Framework versions

Evaluation results