mlx-community/sam-audio-large-fp16
This model was converted to MLX format from facebook/sam-audio-large using mlx-audio version 0.2.10.
Refer to the original model card for more details on the model.
Use with mlx
pip install -U mlx-audio
Voice Isolation:
from mlx_audio.sts import SAMAudio, SAMAudioProcessor, save_audio
import mlx.core as mx
# Load model and processor
processor = SAMAudioProcessor.from_pretrained("facebook/sam-audio-large-fp16")
model = SAMAudio.from_pretrained("facebook/sam-audio-large-fp16")
# Process inputs
batch = processor(
descriptions=["speech"],
audios=["path/to/audio.mp3"],
# anchors=[[("+", 0.2, 0.5)]], # Optional: temporal
)
# Separate audio
result = model.separate(
audios=batch.audios,
descriptions=batch.descriptions,
sizes=batch.sizes,
anchor_ids=batch.anchor_ids,
anchor_alignment=batch.anchor_alignment,
ode_decode_chunk_size=50, # Chunked decoding for memory efficiency
)
# For long audio files, use separate_long().
# Note: This is slower than separate() but it is more memory efficient.
# result = model.separate_long(
# audios=batch.audios,
# descriptions=batch.descriptions,
# chunk_seconds=10.0,
# overlap_seconds=3.0,
# anchor_ids=batch.anchor_ids,
# anchor_alignment=batch.anchor_alignment,
# ode_decode_chunk_size=50, # Chunked decoding for memory efficiency
# )
# Save output
## Isolated speech
save_audio(result.target[0], "separated.wav", sample_rate=model.sample_rate)
## Residual audio (background music/noise/other sounds)
save_audio(result.residual[0], "residual.wav", sample_rate=model.sample_rate)
# Check memory usage
print(f"Peak memory: {result.peak_memory:.2f} GB")
- Downloads last month
- 23
Model tree for mlx-community/sam-audio-large-fp16
Base model
facebook/sam-audio-large