Orpheus Music Transformer Training Logs
Training hyperparameters for both models (base and bridge)
- GPU: Single Nvidia GH200 96GB
- learning_rate: 1e-04
- train_batch_size: 9
- gradients_accumulation_value: 8
- gradients_clip_value: 1.0
- eval_batch_size: 9
- optimizer: PyTorch Adam with default parameters
- lr_scheduler_type: constant
- precision: bfloat16
- embeddings: RoPE
- attention: flash
- num_epochs: 4
Base model train/eval metrics
Cross-entropy losses/accs (averaged last 100 values)
train_loss: 0.6657808522880078
train_acc: 0.8036309605836869
val_loss: 0.685554896891117
val_acc: 0.7972136563062668
epochs: 4
num_steps: 128497
training_time: 256 hours / 10.66 days