Model Card for MORTM (Metric-Oriented Rhythmic Transformer for Melodic generation)

MORTM is a Transformer-based model designed for melody generation, with a strong emphasis on metric (rhythmic) structure. It represents music as sequences of pitch, duration, and relative beat positions within a measure (normalized to 96 ticks), making it suitable for time-robust, rhythm-aware music generation tasks.

This modelcard aims to be a base template for new models. It has been generated using this raw template.

Model Details

Model Description

MORTM (Metric-Oriented Rhythmic Transformer for Melodic generation) is a decoder-only Transformer architecture optimized for music generation with rhythmic awareness. It generates melodies measure-by-measure in an autoregressive fashion. The model supports chord-conditional generation and is equipped with the following features:

Mixture of Experts (MoE) in the feedforward layers for capacity increase and compute efficiency.
ALiBi (Attention with Linear Biases) for relative positional biasing.
FlashAttention2 for fast and memory-efficient attention.
Relative tick-based tokenization (e.g., [Position, Duration, Pitch]) for metric robustness.
Developed by: Koue Okazaki & Takaki Nagoshi
Funded by [optional]: Nihon University, Graduate School of Integrated Basic Sciences
Shared by [optional]: ProjectMORTM
Model type: Transformer (decoder-only with MoE and ALiBi)
Language(s) (NLP): N/A (music domain)
License: MIT
Finetuned from model [optional]: Custom-built from scratch (not fine-tuned from a pretrained LM)

Model Sources [optional]

Repository: https://github.com/Ayato964/MORTM (replace with actual link)
Paper [optional]: In submission
Demo [optional]: Coming soon

Uses

Direct Use

MORTM can generate melodies from scratch or conditionally based on chord progressions. It is ideal for:

Melody composition in pop, jazz, and improvisational styles.
Real-time melodic suggestion systems for human-AI co-creation.
Music education and melody completion tools.

Downstream Use [optional]

Style transfer with different chord inputs.
Harmonization and rhythm-based accompaniment systems.

Out-of-Scope Use

Audio-to-audio tasks (e.g., vocal separation).
Raw audio synthesis (requires additional vocoder).
Not suitable for genre classification or music recommendation.

Bias, Risks, and Limitations

As the training dataset is primarily composed of Western tonal music, the model may underperform on:

Non-tonal, microtonal, or traditional music styles.
Polyrhythmic or tempo-variable music.
Genres not sufficiently represented in training data (e.g., Indian classical).

Recommendations

Generated melodies should be manually reviewed in professional music contexts. Users are encouraged to retrain or fine-tune on representative datasets when applying to culturally specific music.

How to Get Started with the Model

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("nagoshidayo/mortm")
tokenizer = AutoTokenizer.from_pretrained("nagoshidayo/mortm")