Music Source Separation with Band-Split RoPE Transformer
Paper
•
2309.02612
•
Published
•
1
Model for the Music source separation task. Its implementation is referenced to the existing BS-RoFormer code.
针对音乐音频分离任务的模型。改编自 现有的 BS-RoFormer 模型代码。
模型参数:
总参数量 17.9M,在 MUSDB18HQ 数据的 val 集上达到平均 SDR 9.0 的性能。分轨具体 SDR:
使用的 transformers 库版本为 4.55.4。为了正常运行模型还需要安装库 soudfile、einops 和 librosa。
GPU 推理:
from transformers import AutoModel
import soundfile
import torch
import librosa
model_name = "HiDolen/Mini-BS-RoFormer-18M"
model = AutoModel.from_pretrained(
model_name,
trust_remote_code=True,
)
model.to("cuda")
# 加载音频
file = "./Bruno Mars - Runaway Baby.mp3"
waveform, sr = librosa.load(file, sr=44100, mono=False)
waveform = torch.tensor(waveform).float()
waveform = waveform.to("cuda")
# 进行推理
result = model.separate(
waveform,
chunk_size=44100 * 6,
overlap_size=44100 * 3,
gap_size=0,
batch_size=2,
verbose=True,
)
# 保存处理结果
for i in range(result.shape[0]):
soundfile.write(f"separated_stem_{i}.wav", result[i].cpu().numpy().T, 44100)
只分离伴奏人声 2 轨而不是分离 4 轨:
from transformers import AutoModel
import soundfile
import torch
import librosa
model_name = "HiDolen/Mini-BS-RoFormer-18M"
model = AutoModel.from_pretrained(
model_name,
trust_remote_code=True,
)
model.to("cuda")
# 加载音频
file = "./Bruno Mars - Runaway Baby.mp3"
waveform, sr = librosa.load(file, sr=44100, mono=False)
waveform = torch.tensor(waveform).float()
waveform = waveform.to("cuda")
# 进行推理
result = model.separate(
waveform,
chunk_size=44100 * 6,
overlap_size=44100 * 3,
gap_size=0,
batch_size=2,
verbose=True,
)
instrumental = result[0] + result[1] + result[2]
vocals = result[3]
result = torch.stack([instrumental, vocals], dim=0)
for i in range(result.shape[0]):
soundfile.write(f"separated_stem_{i}.wav", result[i].cpu().numpy().T, 44100)
使用 MUSDB18HQ 数据进行训练。
不使用原论文中提到的 Multi-STFT 损失项以提高训练速度。
学习率 5e-4,以 batch_size=6 训练 200k 步。
训练时发现的一些技巧: