ChunkFormer Model
Usage
Install the package:
pip install chunkformer
from chunkformer import ChunkFormerModel
# Load the model
model = ChunkFormerModel.from_pretrained("khanhld/chunkformer-ctc-small-libri-100h")
# For long-form audio transcription
transcription = model.endless_decode(
audio_path="path/to/your/audio.wav",
chunk_size=64,
left_context_size=128,
right_context_size=128,
return_timestamps=True
)
print(transcription)
# For batch processing
audio_files = ["audio1.wav", "audio2.wav", "audio3.wav"]
transcriptions = model.batch_decode(
audio_paths=audio_files,
chunk_size=64,
left_context_size=128,
right_context_size=128
)
Training
This model was trained using the ChunkFormer framework. For more details about the training process and to access the source code, please visit: https://github.com/khanld/chunkformer
Paper: https://arxiv.org/abs/2502.14673
Citation
If you use this work in your research, please cite:
@INPROCEEDINGS{10888640,
author={Le, Khanh and Ho, Tuan Vu and Tran, Dung and Chau, Duc Thanh},
booktitle={ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
title={ChunkFormer: Masked Chunking Conformer For Long-Form Speech Transcription},
year={2025},
volume={},
number={},
pages={1-5},
keywords={Scalability;Memory management;Graphics processing units;Signal processing;Performance gain;Hardware;Resource management;Speech processing;Standards;Context modeling;chunkformer;masked batch;long-form transcription},
doi={10.1109/ICASSP49660.2025.10888640}}
- Downloads last month
- 4