update usage
Browse files
    	
        README.md
    CHANGED
    
    | @@ -91,39 +91,69 @@ We evaluate the models using **Word Error Rate (WER)**. To ensure a fair compari | |
| 91 | 
             
            ## Quick Usage
         | 
| 92 | 
             
            To use the ChunkFormer model for English Automatic Speech Recognition, follow these steps:
         | 
| 93 |  | 
| 94 | 
            -
            1 | 
| 95 | 
             
            ```bash
         | 
| 96 | 
            -
             | 
| 97 | 
            -
            cd chunkformer
         | 
| 98 | 
            -
            pip install -r requirements.txt   
         | 
| 99 | 
             
            ```
         | 
| 100 | 
            -
             | 
|  | |
| 101 | 
             
            ```bash
         | 
| 102 | 
            -
             | 
| 103 | 
            -
             | 
|  | |
| 104 | 
             
            ```
         | 
| 105 | 
            -
             | 
| 106 | 
            -
             | 
| 107 | 
            -
             | 
| 108 | 
            -
             | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 109 | 
             
            ```
         | 
| 110 | 
            -
            This will download the model checkpoint to the checkpoints folder inside your chunkformer directory.
         | 
| 111 |  | 
| 112 | 
            -
             | 
|  | |
|  | |
| 113 | 
             
            ```bash
         | 
| 114 | 
            -
             | 
| 115 | 
            -
                --model_checkpoint  | 
| 116 | 
             
                --long_form_audio path/to/audio.wav \
         | 
| 117 | 
            -
                --total_batch_duration 14400 \ | 
| 118 | 
             
                --chunk_size 64 \
         | 
| 119 | 
             
                --left_context_size 128 \
         | 
| 120 | 
             
                --right_context_size 128
         | 
| 121 | 
             
            ```
         | 
|  | |
| 122 | 
             
            Example Output:
         | 
| 123 | 
             
            ```
         | 
| 124 | 
             
            [00:00:01.200] - [00:00:02.400]: this is a transcription example
         | 
| 125 | 
             
            [00:00:02.500] - [00:00:03.700]: testing the long-form audio
         | 
| 126 | 
             
            ```
         | 
|  | |
| 127 | 
             
            **Advanced Usage** can be found [HERE](https://github.com/khanld/chunkformer/tree/main?tab=readme-ov-file#usage)
         | 
| 128 |  | 
| 129 |  | 
|  | |
| 91 | 
             
            ## Quick Usage
         | 
| 92 | 
             
            To use the ChunkFormer model for English Automatic Speech Recognition, follow these steps:
         | 
| 93 |  | 
| 94 | 
            +
            ### Option 1: Install from PyPI (Recommended)
         | 
| 95 | 
             
            ```bash
         | 
| 96 | 
            +
            pip install chunkformer
         | 
|  | |
|  | |
| 97 | 
             
            ```
         | 
| 98 | 
            +
             | 
| 99 | 
            +
            ### Option 2: Install from source
         | 
| 100 | 
             
            ```bash
         | 
| 101 | 
            +
            git clone https://github.com/khanld/chunkformer.git
         | 
| 102 | 
            +
            cd chunkformer
         | 
| 103 | 
            +
            pip install -e .
         | 
| 104 | 
             
            ```
         | 
| 105 | 
            +
             | 
| 106 | 
            +
            ### Python API Usage
         | 
| 107 | 
            +
            ```python
         | 
| 108 | 
            +
            from chunkformer import ChunkFormerModel
         | 
| 109 | 
            +
             | 
| 110 | 
            +
            # Load the English model from Hugging Face
         | 
| 111 | 
            +
            model = ChunkFormerModel.from_pretrained("khanhld/chunkformer-large-en-libri-960h")
         | 
| 112 | 
            +
             | 
| 113 | 
            +
            # For single long-form audio transcription
         | 
| 114 | 
            +
            transcription = model.endless_decode(
         | 
| 115 | 
            +
                audio_path="path/to/long_audio.wav",
         | 
| 116 | 
            +
                chunk_size=64,
         | 
| 117 | 
            +
                left_context_size=128,
         | 
| 118 | 
            +
                right_context_size=128,
         | 
| 119 | 
            +
                total_batch_duration=14400,  # in seconds
         | 
| 120 | 
            +
                return_timestamps=True
         | 
| 121 | 
            +
            )
         | 
| 122 | 
            +
            print(transcription)
         | 
| 123 | 
            +
             | 
| 124 | 
            +
            # For batch processing of multiple audio files
         | 
| 125 | 
            +
            audio_files = ["audio1.wav", "audio2.wav", "audio3.wav"]
         | 
| 126 | 
            +
            transcriptions = model.batch_decode(
         | 
| 127 | 
            +
                audio_paths=audio_files,
         | 
| 128 | 
            +
                chunk_size=64,
         | 
| 129 | 
            +
                left_context_size=128,
         | 
| 130 | 
            +
                right_context_size=128,
         | 
| 131 | 
            +
                total_batch_duration=1800  # Total batch duration in seconds
         | 
| 132 | 
            +
            )
         | 
| 133 | 
            +
             | 
| 134 | 
            +
            for i, transcription in enumerate(transcriptions):
         | 
| 135 | 
            +
                print(f"Audio {i+1}: {transcription}")
         | 
| 136 | 
             
            ```
         | 
|  | |
| 137 |  | 
| 138 | 
            +
            ### Command Line Usage
         | 
| 139 | 
            +
            After installation, you can use the command line interface:
         | 
| 140 | 
            +
             | 
| 141 | 
             
            ```bash
         | 
| 142 | 
            +
            chunkformer-decode \
         | 
| 143 | 
            +
                --model_checkpoint khanhld/chunkformer-large-en-libri-960h \
         | 
| 144 | 
             
                --long_form_audio path/to/audio.wav \
         | 
| 145 | 
            +
                --total_batch_duration 14400 \
         | 
| 146 | 
             
                --chunk_size 64 \
         | 
| 147 | 
             
                --left_context_size 128 \
         | 
| 148 | 
             
                --right_context_size 128
         | 
| 149 | 
             
            ```
         | 
| 150 | 
            +
             | 
| 151 | 
             
            Example Output:
         | 
| 152 | 
             
            ```
         | 
| 153 | 
             
            [00:00:01.200] - [00:00:02.400]: this is a transcription example
         | 
| 154 | 
             
            [00:00:02.500] - [00:00:03.700]: testing the long-form audio
         | 
| 155 | 
             
            ```
         | 
| 156 | 
            +
             | 
| 157 | 
             
            **Advanced Usage** can be found [HERE](https://github.com/khanld/chunkformer/tree/main?tab=readme-ov-file#usage)
         | 
| 158 |  | 
| 159 |  | 
