DAC Audio Codec – Fine-Tuned on LibriSpeech

This is a fine-tuned version of Descript Audio Codec (DAC). Fine-tuning was done using a fork of the original code and 960 hours of LibriSpeech dataset.

Model Details

Architecture: Descript Audio Codec
Framework: PyTorch
Trained on: LibriSpeech (960 hours)
Input: 16 kHz mono audio
Output: Reconstructed audio waveform
Temporal compression rate: $320$, i.e. $1$ seconds results in $50$ frames.

Three versions are provided:

16x16_128: DAC with encoded representation of size 256 with 16 codebooks of size 256, resulting in $2^{8\cdot16}=2^{128}$ possible representations and $6.4$ kbps.
16x16_130: DAC with encoded representation of size 256 with 13 codebooks of size 1024, resulting in $2^{10\cdot13}=2^{130}$ possible representations and $6.5$ kbps.
24x24_128: DAC with encoded representation of size 576 with 16 codebooks of size 256, resulting in $2^{8\cdot16}=2^{128}$ possible representations and $6.4$ kbps.

Files Included

Each version contains "latest" (250k steps) and "best" (best validation loss) checkpoints.

File	Description
`weights.pth`	Model weights (PyTorch `state_dict`)
`metadata.pth`	Logs (optional)
`README.md`	This file

Example Usage

import torch
import dac

# Load model
weights = torch.load("weights.pth")
model = dac.DAC.load(weights)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Blinorot
/

dac_finetuned_librispeech

DAC Audio Codec – Fine-Tuned on LibriSpeech

Model Details

Files Included

Example Usage

Dataset used to train Blinorot/dac_finetuned_librispeech