DAC Audio Codec β Fine-Tuned on LibriSpeech
This is a fine-tuned version of Descript Audio Codec (DAC). Fine-tuning was done using a fork of the original code and 960 hours of LibriSpeech dataset.
Model Details
- Architecture: Descript Audio Codec
- Framework: PyTorch
- Trained on: LibriSpeech (960 hours)
- Input: 16 kHz mono audio
- Output: Reconstructed audio waveform
- Temporal compression rate: $320$, i.e. $1$ seconds results in $50$ frames.
Three versions are provided:
16x16_128
: DAC with encoded representation of size 256 with 16 codebooks of size 256, resulting in $2^{8\cdot16}=2^{128}$ possible representations and $6.4$ kbps.16x16_130
: DAC with encoded representation of size 256 with 13 codebooks of size 1024, resulting in $2^{10\cdot13}=2^{130}$ possible representations and $6.5$ kbps.24x24_128
: DAC with encoded representation of size 576 with 16 codebooks of size 256, resulting in $2^{8\cdot16}=2^{128}$ possible representations and $6.4$ kbps.
Files Included
Each version contains "latest" (250k steps) and "best" (best validation loss) checkpoints.
File | Description |
---|---|
weights.pth |
Model weights (PyTorch state_dict ) |
metadata.pth |
Logs (optional) |
README.md |
This file |
Example Usage
import torch
import dac
# Load model
weights = torch.load("weights.pth")
model = dac.DAC.load(weights)
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support