AEROMamba: Efficient Audio Super-Resolution

AI-Generated README - Original: GitHub | Demo


Model Overview

Architecture: Hybrid GAN + Mamba SSM
Task: 11.025 kHz β†’ 44.1 kHz audio upsampling
Key Improvements:

  • 14x faster inference vs AERO
  • 5x less GPU memory usage
  • 66.47 subjective score (vs AERO's 60.03)

Checkpoint: MUSDB18-HQ Model


Quick Start

# Installation
pip install torch==1.12.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
pip install causal-conv1d==1.1.2 mamba-ssm==1.1.3

# Inference
from src.models.aeromamba import AEROMamba
import torchaudio

model = AEROMamba.load_from_checkpoint("checkpoint.th")
lr_audio, sr = torchaudio.load("low_res.wav")  # 11kHz input
hr_audio = model(lr_audio)  # 44.1kHz output

Performance (MUSDB18)

Metric Low-Res AERO AEROMamba
ViSQOL ↑ 1.82 2.90 2.93
LSD ↓ 3.98 1.34 1.23
Subjective ↑ 38.22 60.03 66.47

Hardware: 14x faster on RTX 3090 (0.087s vs 1.246s)


Training Data

MUSDB18-HQ:

  • 150 full-track music recordings
  • 44.1 kHz originals β†’ 11.025 kHz downsampled pairs
  • 87.5/12.5 train-val split

Citation

@inproceedings{Abreu2024lamir,
  author    = {Wallace Abreu and Luiz Wagner Pereira Biscainho},
  title     = {AEROMamba: Efficient Audio SR with GANs and SSMs},
  booktitle = {Proc. Latin American Music IR Workshop},
  year      = {2024}
}

This README was AI-generated based on original project materials. For training code and OLA inference scripts, visit the GitHub repo.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Spaces using innova-ai/AEROMamba 4