Model

Aria is a pretrained autoregressive generative model for symbolic music based on the LLaMA 3.2 (1B) architecture. It was trained on ~60k hours of MIDI transcriptions of expressive solo-piano recordings. It has been finetuned to produce realistic continuations of solo-piano compositions as well as to produce general-purpose contrastive MIDI embeddings.

This HuggingFace page contains weights and usage instructions for the embedding model. For the pretrained base model, see aria-medium-base, and for the generative model, see aria-medium-gen.

πŸ“– Read our release blog post and paper
πŸš€ Check out the real-time demo in the official GitHub repository
πŸ“Š Get access to our training dataset Aria-MIDI to train your own models

Usage Guidelines

Our embedding model was trained to capture composition and performance-level attributes by learning to embed different random slices of transcriptions of solo-piano performances into similar regions of latent space. As the model was trained to produce global embeddings with data augmentation (e.g., pitch, tempo, etc.), it might not be appropriate for every use case. For more information, see our paper.

Quickstart

All of our models were trained using MIDI tooling and tokenizer accessible in the aria-utils repository. Install the aria-utils package with pip:

pip install git+https://github.com/EleutherAI/aria-utils.git

You can then generate a embedding for a (piano) MIDI file using the transformers library:

pip install transformers
pip install torch
from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer

PROMPT_MIDI_LOAD_PATH = "mydir/prompt.midi"
MAX_SEQ_LEN = 2048

model = AutoModelForCausalLM.from_pretrained(
    "loubb/aria-medium-embedding",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "loubb/aria-medium-embedding",
    trust_remote_code=True,
)

prompt = tokenizer.encode_from_file(PROMPT_MIDI_LOAD_PATH, return_tensors="pt")

# Only sequences up to 2048 are supported.
# Embedding is extracted from end-of-sequence token
assert prompt.input_ids.shape[1] <= MAX_SEQ_LEN
assert prompt.input_ids[0, -1] == tokenizer._convert_token_to_id(tokenizer.eos_token)

# Alternatively if the sequence is too long:
prompt.input_ids = prompt.input_ids[:, :MAX_SEQ_LEN]
prompt.input_ids[:, -1] = tokenizer._convert_token_to_id(tokenizer.eos_token)

# Generate and extract embedding
outputs = model.forward(input_ids=prompt.input_ids)
embedding = outputs[0].squeeze(0)

License and Attribution

The Aria project has been kindly supported by EleutherAI, Stability AI, as well as by a compute grant from the Ministry of Science and ICT of Korea. Our models and MIDI tooling are released under the Apache-2.0 license. If you use the models or tooling for follow-up work, please cite the paper in which they were introduced:

@inproceedings{bradshawscaling,
  title={Scaling Self-Supervised Representation Learning for Symbolic Piano Performance},
  author={Bradshaw, Louis and Fan, Honglu and Spangher, Alex and Biderman, Stella and Colton, Simon},
  booktitle={arXiv preprint},
  year={2025},
  url={https://arxiv.org/abs/2504.15071}
}
Downloads last month
1
Safetensors
Model size
632M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train loubb/aria-medium-embedding