mlx-community/sam-audio-large-fp16

This model was converted to MLX format from facebook/sam-audio-large using mlx-audio version 0.2.10. Refer to the original model card for more details on the model.

Use with mlx

pip install -U mlx-audio

Voice Isolation:

from mlx_audio.sts import SAMAudio, SAMAudioProcessor, save_audio
import mlx.core as mx

# Load model and processor
processor = SAMAudioProcessor.from_pretrained("facebook/sam-audio-large-fp16")
model = SAMAudio.from_pretrained("facebook/sam-audio-large-fp16")

# Process inputs
batch = processor(
    descriptions=["speech"],
    audios=["path/to/audio.mp3"],
    # anchors=[[("+", 0.2, 0.5)]],  # Optional: temporal
)

# Separate audio
result = model.separate(
    audios=batch.audios,
    descriptions=batch.descriptions,
    sizes=batch.sizes,
    anchor_ids=batch.anchor_ids,
    anchor_alignment=batch.anchor_alignment,
    ode_decode_chunk_size=50,  # Chunked decoding for memory efficiency
)

# For long audio files, use separate_long().
# Note: This is slower than separate() but it is more memory efficient.
# result = model.separate_long(
#     audios=batch.audios,
#     descriptions=batch.descriptions,
#     chunk_seconds=10.0,
#     overlap_seconds=3.0,
#     anchor_ids=batch.anchor_ids,
#     anchor_alignment=batch.anchor_alignment,
#     ode_decode_chunk_size=50,  # Chunked decoding for memory efficiency
# )

# Save output
## Isolated speech
save_audio(result.target[0], "separated.wav", sample_rate=model.sample_rate)

## Residual audio (background music/noise/other sounds)
save_audio(result.residual[0], "residual.wav", sample_rate=model.sample_rate)

# Check memory usage
print(f"Peak memory: {result.peak_memory:.2f} GB")

Downloads last month: 23

Safetensors

Model size

3B params

Tensor type

F16

Model tree for mlx-community/sam-audio-large-fp16

Base model

facebook/sam-audio-large

Finetuned

(2)

this model