arxiv:2505.04621

Score Distillation Sampling for Audio: Source Separation, Synthesis, and Beyond

Published on May 7

Authors:

Abstract

We introduce Audio-SDS, a generalization of Score Distillation Sampling (SDS) to text-conditioned audio diffusion models. While SDS was initially designed for text-to-3D generation using image diffusion, its core idea of distilling a powerful generative prior into a separate parametric representation extends to the audio domain. Leveraging a single pretrained model, Audio-SDS enables a broad range of tasks without requiring specialized datasets. In particular, we demonstrate how Audio-SDS can guide physically informed impact sound simulations, calibrate FM-synthesis parameters, and perform prompt-specified source separation. Our findings illustrate the versatility of distillation-based methods across modalities and establish a robust foundation for future work using generative priors in audio tasks.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

No model linking this paper

Cite arxiv.org/abs/2505.04621 in a model README.md to link it from this page.

No dataset linking this paper

Cite arxiv.org/abs/2505.04621 in a dataset README.md to link it from this page.

No Space linking this paper

Cite arxiv.org/abs/2505.04621 in a Space README.md to link it from this page.

No Collection including this paper

Add this paper to a collection to link it from this page.