# Orpheus Music Transformer Training Logs *** ## Training hyperparameters for both models (base and bridge) - GPU: Single Nvidia GH200 96GB - learning_rate: 1e-04 - train_batch_size: 9 - gradients_accumulation_value: 8 - gradients_clip_value: 1.0 - eval_batch_size: 9 - optimizer: PyTorch Adam with default parameters - lr_scheduler_type: constant - precision: bfloat16 - embeddings: RoPE - attention: flash - num_epochs: 3 *** ## [Base model train/eval metrics](https://huggingface.co/asigalov61/Orpheus-Music-Transformer/tree/main/logs/base) ### Cross-entropy losses/accs (averaged last 100 values) - train_loss: 0.6559235458076 - train_acc: 0.8074536085128784 - eval_loss: 0.6947438363730908 - eval_acc: 0.7946965831518173 - epochs: 3 - num_steps: 96332 - training_time: 194 hours / 8 days *** ### Project Los Angeles ### Tegridy Code 2025