khanhld commited on
Commit
003957f
·
1 Parent(s): b223f8d

update usage

Browse files
Files changed (1) hide show
  1. README.md +46 -16
README.md CHANGED
@@ -91,39 +91,69 @@ We evaluate the models using **Word Error Rate (WER)**. To ensure a fair compari
91
  ## Quick Usage
92
  To use the ChunkFormer model for English Automatic Speech Recognition, follow these steps:
93
 
94
- 1. **Download the ChunkFormer Repository**
95
  ```bash
96
- git clone https://github.com/khanld/chunkformer.git
97
- cd chunkformer
98
- pip install -r requirements.txt
99
  ```
100
- 2. **Download the Model Checkpoint from Hugging Face**
 
101
  ```bash
102
- pip install huggingface_hub
103
- huggingface-cli download khanhld/chunkformer-large-en-libri-960h --local-dir "./chunkformer-large-en-libri-960h"
 
104
  ```
105
- or
106
- ```bash
107
- git lfs install
108
- git clone https://huggingface.co/khanhld/chunkformer-large-en-libri-960h
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
109
  ```
110
- This will download the model checkpoint to the checkpoints folder inside your chunkformer directory.
111
 
112
- 3. **Run the model**
 
 
113
  ```bash
114
- python decode.py \
115
- --model_checkpoint path/to/local/chunkformer-large-en-libri-960h \
116
  --long_form_audio path/to/audio.wav \
117
- --total_batch_duration 14400 \ #in second, default is 1800
118
  --chunk_size 64 \
119
  --left_context_size 128 \
120
  --right_context_size 128
121
  ```
 
122
  Example Output:
123
  ```
124
  [00:00:01.200] - [00:00:02.400]: this is a transcription example
125
  [00:00:02.500] - [00:00:03.700]: testing the long-form audio
126
  ```
 
127
  **Advanced Usage** can be found [HERE](https://github.com/khanld/chunkformer/tree/main?tab=readme-ov-file#usage)
128
 
129
 
 
91
  ## Quick Usage
92
  To use the ChunkFormer model for English Automatic Speech Recognition, follow these steps:
93
 
94
+ ### Option 1: Install from PyPI (Recommended)
95
  ```bash
96
+ pip install chunkformer
 
 
97
  ```
98
+
99
+ ### Option 2: Install from source
100
  ```bash
101
+ git clone https://github.com/khanld/chunkformer.git
102
+ cd chunkformer
103
+ pip install -e .
104
  ```
105
+
106
+ ### Python API Usage
107
+ ```python
108
+ from chunkformer import ChunkFormerModel
109
+
110
+ # Load the English model from Hugging Face
111
+ model = ChunkFormerModel.from_pretrained("khanhld/chunkformer-large-en-libri-960h")
112
+
113
+ # For single long-form audio transcription
114
+ transcription = model.endless_decode(
115
+ audio_path="path/to/long_audio.wav",
116
+ chunk_size=64,
117
+ left_context_size=128,
118
+ right_context_size=128,
119
+ total_batch_duration=14400, # in seconds
120
+ return_timestamps=True
121
+ )
122
+ print(transcription)
123
+
124
+ # For batch processing of multiple audio files
125
+ audio_files = ["audio1.wav", "audio2.wav", "audio3.wav"]
126
+ transcriptions = model.batch_decode(
127
+ audio_paths=audio_files,
128
+ chunk_size=64,
129
+ left_context_size=128,
130
+ right_context_size=128,
131
+ total_batch_duration=1800 # Total batch duration in seconds
132
+ )
133
+
134
+ for i, transcription in enumerate(transcriptions):
135
+ print(f"Audio {i+1}: {transcription}")
136
  ```
 
137
 
138
+ ### Command Line Usage
139
+ After installation, you can use the command line interface:
140
+
141
  ```bash
142
+ chunkformer-decode \
143
+ --model_checkpoint khanhld/chunkformer-large-en-libri-960h \
144
  --long_form_audio path/to/audio.wav \
145
+ --total_batch_duration 14400 \
146
  --chunk_size 64 \
147
  --left_context_size 128 \
148
  --right_context_size 128
149
  ```
150
+
151
  Example Output:
152
  ```
153
  [00:00:01.200] - [00:00:02.400]: this is a transcription example
154
  [00:00:02.500] - [00:00:03.700]: testing the long-form audio
155
  ```
156
+
157
  **Advanced Usage** can be found [HERE](https://github.com/khanld/chunkformer/tree/main?tab=readme-ov-file#usage)
158
 
159