DAC Audio Codec – Fine-Tuned on LibriSpeech

This is a fine-tuned version of Descript Audio Codec (DAC). Fine-tuning was done using a fork of the original code and 960 hours of LibriSpeech dataset.

Model Details

  • Architecture: Descript Audio Codec
  • Framework: PyTorch
  • Trained on: LibriSpeech (960 hours)
  • Input: 16 kHz mono audio
  • Output: Reconstructed audio waveform
  • Temporal compression rate: $320$, i.e. $1$ seconds results in $50$ frames.

Three versions are provided:

  • 16x16_128: DAC with encoded representation of size 256 with 16 codebooks of size 256, resulting in $2^{8\cdot16}=2^{128}$ possible representations and $6.4$ kbps.
  • 16x16_130: DAC with encoded representation of size 256 with 13 codebooks of size 1024, resulting in $2^{10\cdot13}=2^{130}$ possible representations and $6.5$ kbps.
  • 24x24_128: DAC with encoded representation of size 576 with 16 codebooks of size 256, resulting in $2^{8\cdot16}=2^{128}$ possible representations and $6.4$ kbps.

Files Included

Each version contains "latest" (250k steps) and "best" (best validation loss) checkpoints.

File Description
weights.pth Model weights (PyTorch state_dict)
metadata.pth Logs (optional)
README.md This file

Example Usage

import torch
import dac

# Load model
weights = torch.load("weights.pth")
model = dac.DAC.load(weights)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train Blinorot/dac_finetuned_librispeech