| # Orpheus Music Transformer Training Logs | |
| *** | |
| ## Training hyperparameters for both models (base and bridge) | |
| - GPU: Single Nvidia GH200 96GB | |
| - learning_rate: 1e-04 | |
| - train_batch_size: 9 | |
| - gradients_accumulation_value: 8 | |
| - gradients_clip_value: 1.0 | |
| - eval_batch_size: 9 | |
| - optimizer: PyTorch Adam with default parameters | |
| - lr_scheduler_type: constant | |
| - precision: bfloat16 | |
| - embeddings: RoPE | |
| - attention: flash | |
| - num_epochs: 4 | |
| *** | |
| ## [Base model train/eval metrics](https://huggingface.co/asigalov61/Orpheus-Music-Transformer/tree/main/logs/base) | |
| ### Cross-entropy losses/accs (averaged last 100 values) | |
| - train_loss: 0.6657808522880078 | |
| - train_acc: 0.8036309605836869 | |
| - val_loss: 0.685554896891117 | |
| - val_acc: 0.7972136563062668 | |
| - epochs: 4 | |
| - num_steps: 128497 | |
| - training_time: 256 hours / 10.66 days | |
| *** | |
| ### Project Los Angeles | |
| ### Tegridy Code 2025 |