File size: 4,274 Bytes
e7cd38e 42d10df 0f40605 74f256b e27d9df 0f40605 e27d9df 0f40605 74f256b 0f40605 9911b60 0f40605 ababe34 0f40605 e27d9df 0f40605 9911b60 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 |
---
license: apache-2.0
datasets:
- loubb/aria-midi
language:
- en
tags:
- music
- MIDI
- piano
---
# Model
`Aria` is a pretrained autoregressive generative model for symbolic music based on the LLaMA 3.2 (1B) architecture. It was trained on ~60k hours of MIDI transcriptions of expressive solo-piano recordings. It has been finetuned to produce realistic continuations of solo-piano compositions as well as to produce general-purpose contrastive MIDI embeddings.
This HuggingFace page contains weights and usage instructions for the pretrained base model. For the generative model, see [aria-medium-gen](https://huggingface.co/loubb/aria-medium-base/resolve/main/model-gen.safetensors?download=true), and for the embedding model, see [aria-medium-embedding](https://huggingface.co/loubb/aria-medium-embedding).
π Read our [paper](https://arxiv.org/abs/2506.23869)
π Check out the real-time demo in the official [GitHub repository](https://github.com/EleutherAI/aria)
π Get access to our training dataset [Aria-MIDI](https://huggingface.co/datasets/loubb/aria-midi) to train your own models
## Usage Guidelines
### Intended Use
The model is most naturally suited for generating continuations of existing MIDI files rather than generating music from scratch (unconditioned). While the tokenizer and model checkpoint technically support multi-track tokenization and generation, multi-track music comprises a minority of the training data, so we highly recommend performing inference with single-track piano MIDI files for optimal results.
### Data Memorization Considerations
Due to overrepresentation of performances of popular compositions (e.g., those from well-known classical composers such as Chopin) and difficulties in completely deduplicating the training data, some of these compositions have been compositionally memorized by the model. We suggest performing inference with lesser-known compositions or your own music for more original results.
### Input Quality Considerations
Since the model has not been post-trained with any instruction tuning or RLHF (similar to pre-instruct GPT models), it is very sensitive to input quality and performs best when prompted with well-played music. To get sample MIDI files, see the `example-prompts/` directory or explore the [Aria-MIDI](https://huggingface.co/datasets/loubb/aria-midi) dataset.
## Quickstart
All of our models were trained using MIDI tooling and tokenizer accessible in the [aria-utils](https://github.com/EleutherAI/aria-utils) repository. Install the aria-utils package with pip:
```bash
pip install git+https://github.com/EleutherAI/aria-utils.git
```
You can then generate a continuation to a truncated (piano) MIDI file using the transformers library:
```bash
pip install transformers
pip install torch
```
```python
from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
PROMPT_MIDI_LOAD_PATH = "mydir/prompt.midi"
CONTINUATION_MIDI_SAVE_PATH = "mydir/continuation.midi"
model = AutoModelForCausalLM.from_pretrained(
"loubb/aria-medium-base",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
"loubb/aria-medium-base",
trust_remote_code=True,
)
prompt = tokenizer.encode_from_file(
PROMPT_MIDI_LOAD_PATH, return_tensors="pt"
)
continuation = model.generate(
prompt.input_ids[..., :512],
max_length=2048,
do_sample=True,
temperature=0.97,
top_p=0.95,
use_cache=True,
)
midi_dict = tokenizer.decode(continuation[0].tolist())
midi_dict.to_midi().save(CONTINUATION_MIDI_SAVE_PATH)
```
## License and Attribution
The Aria project has been kindly supported by EleutherAI, Stability AI, as well as by a compute grant from the Ministry of Science and ICT of Korea. Our models and MIDI tooling are released under the Apache-2.0 license. If you use the models or tooling for follow-up work, please cite the paper in which they were introduced:
```bibtex
@inproceedings{bradshawscaling,
title={Scaling Self-Supervised Representation Learning for Symbolic Piano Performance},
author={Bradshaw, Louis and Fan, Honglu and Spangher, Alexander and Biderman, Stella and Colton, Simon},
booktitle={arXiv preprint},
year={2025},
url={https://arxiv.org/abs/2506.23869}
}
```
|