asigalov61's picture
Update logs/README.md
20844d6 verified
|
raw
history blame
857 Bytes

Orpheus Music Transformer Training Logs


Training hyperparameters for both models (base and bridge)

  • GPU: Single Nvidia GH200 96GB
  • learning_rate: 1e-04
  • train_batch_size: 9
  • gradients_accumulation_value: 8
  • gradients_clip_value: 1.0
  • eval_batch_size: 9
  • optimizer: PyTorch Adam with default parameters
  • lr_scheduler_type: constant
  • precision: bfloat16
  • embeddings: RoPE
  • attention: flash
  • num_epochs: 4

Base model train/eval metrics

Cross-entropy losses/accs (averaged last 100 values)

  • train_loss: 0.6657808522880078

  • train_acc: 0.8036309605836869

  • val_loss: 0.685554896891117

  • val_acc: 0.7972136563062668

  • epochs: 4

  • num_steps: 128497

  • training_time: 256 hours / 10.66 days


Project Los Angeles

Tegridy Code 2025