license: apache-2.0
datasets:
- projectlosangeles/Godzilla-MIDI-Dataset
language:
- en
tags:
- orpheus
- MIDI
- music-ai
- music-transformer
- SOTA
- multi-instrumental
- music
Orpheus Music Transformer
SOTA 8k multi-instrumental music transformer trained on 2.31M+ high-quality MIDIs
Abstract
Abstract
Project Los Angeles is very proud to present Orpheus Music Transformer, an efficient, SOTA transformer model for long-form, multi-instrumental music generation. At its core lies a 479 M-parameter autoregressive transformer equipped with Rotary Positional Embeddings (RoPE) and Flash Attention, enabling sequence lengths up to 8 k tokens—sufficient to capture extended musical structures. Trained for three epochs on 2.31 million high-quality MIDI tracks from the Godzilla dataset, our model employs a compact 3-token-per-note and 7-token-per-tri-chord encoding, plus a novel duration-and-velocity-last ordering to enhance expressivity. We leverage PyTorch’s bfloat16 precision and memory-efficient sparse-dense products for accelerated inference on CUDA, and provide a top-p sampling filter with adjustable temperature.
The Gradio interface empowers users to upload seed MIDI files or generate from scratch, tune prime/generation token counts, control randomness (temperature, top-p), and optionally append drums or natural “outro” tokens. Generated outputs appear in ten parallel batches with synchronized audio previews and piano-roll plots. Users can iteratively add or remove entire batches to sculpt a final composition, which is rendered back into MIDI and audio via an integrated SoundFont pipeline. Our release demonstrates a seamless blend of state-of-the-art model performance, efficient MIDI tokenization, and user-centric design, fostering rapid exploration of algorithmic composition.
