Model
Aria
is a pretrained autoregressive generative model for symbolic music based on the LLaMA 3.2 (1B) architecture. It was trained on ~60k hours of MIDI transcriptions of expressive solo-piano recordings. It has been finetuned to produce realistic continuations of solo-piano compositions as well as to produce general-purpose contrastive MIDI embeddings.
This HuggingFace page contains weights and usage instructions for the embedding model. For the pretrained base model, see aria-medium-base, and for the generative model, see aria-medium-gen.
π Read our release blog post and paper
π Check out the real-time demo in the official GitHub repository
π Get access to our training dataset Aria-MIDI to train your own models
Usage Guidelines
Our embedding model was trained to capture composition and performance-level attributes by learning to embed different random slices of transcriptions of solo-piano performances into similar regions of latent space. As the model was trained to produce global embeddings with data augmentation (e.g., pitch, tempo, etc.), it might not be appropriate for every use case. For more information, see our paper.
Quickstart
All of our models were trained using MIDI tooling and tokenizer accessible in the aria-utils repository. Install the aria-utils package with pip:
pip install git+https://github.com/EleutherAI/aria-utils.git
You can then generate a embedding for a (piano) MIDI file using the transformers library:
pip install transformers
pip install torch
from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
PROMPT_MIDI_LOAD_PATH = "mydir/prompt.midi"
MAX_SEQ_LEN = 2048
model = AutoModelForCausalLM.from_pretrained(
"loubb/aria-medium-embedding",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
"loubb/aria-medium-embedding",
trust_remote_code=True,
)
prompt = tokenizer.encode_from_file(PROMPT_MIDI_LOAD_PATH, return_tensors="pt")
# Only sequences up to 2048 are supported.
# Embedding is extracted from end-of-sequence token
assert prompt.input_ids.shape[1] <= MAX_SEQ_LEN
assert prompt.input_ids[0, -1] == tokenizer._convert_token_to_id(tokenizer.eos_token)
# Alternatively if the sequence is too long:
prompt.input_ids = prompt.input_ids[:, :MAX_SEQ_LEN]
prompt.input_ids[:, -1] = tokenizer._convert_token_to_id(tokenizer.eos_token)
# Generate and extract embedding
outputs = model.forward(input_ids=prompt.input_ids)
embedding = outputs[0].squeeze(0)
License and Attribution
The Aria project has been kindly supported by EleutherAI, Stability AI, as well as by a compute grant from the Ministry of Science and ICT of Korea. Our models and MIDI tooling are released under the Apache-2.0 license. If you use the models or tooling for follow-up work, please cite the paper in which they were introduced:
@inproceedings{bradshawscaling,
title={Scaling Self-Supervised Representation Learning for Symbolic Piano Performance},
author={Bradshaw, Louis and Fan, Honglu and Spangher, Alex and Biderman, Stella and Colton, Simon},
booktitle={arXiv preprint},
year={2025},
url={https://arxiv.org/abs/2504.15071}
}
- Downloads last month
- 1