Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,67 @@
|
|
1 |
-
---
|
2 |
-
license: apache-2.0
|
3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
---
|
4 |
+
# Introduction
|
5 |
+
|
6 |
+
The MossFormer2_SR_48K model weights for 48 kHz speech super-resolution in [ClearerVoice-Studio](https://github.com/modelscope/ClearerVoice-Studio/tree/main) repo.
|
7 |
+
|
8 |
+
This model is trained on large scale datasets inclduing open-sourced and private data.
|
9 |
+
|
10 |
+
The purpose is to enhance the quality of speech signals by increasing their temporal and spectral resolution, typically by converting low-resolution (low sampling rate)
|
11 |
+
audio to high-resolution (high sampling rate) audio. This involves reconstructing the high-frequency components that are often missing in low-resolution signals.
|
12 |
+
|
13 |
+
# Install
|
14 |
+
|
15 |
+
**Clone the Repository**
|
16 |
+
|
17 |
+
``` sh
|
18 |
+
git clone https://github.com/modelscope/ClearerVoice-Studio.git
|
19 |
+
```
|
20 |
+
|
21 |
+
**Create Conda Environment**
|
22 |
+
|
23 |
+
``` sh
|
24 |
+
cd ClearerVoice-Studio
|
25 |
+
conda create -n clearvoice python=3.12.1
|
26 |
+
conda activate clearvoice
|
27 |
+
pip install -r requirements.txt
|
28 |
+
```
|
29 |
+
|
30 |
+
**Run Script**
|
31 |
+
|
32 |
+
Go to `clearvoice/` and use the following examples. The MossFormer2_SR_48K model will be downloaded from huggingface automatically.
|
33 |
+
|
34 |
+
Sample example 1: use model `MossFormer2_SR_48K` to process one wave file of `samples/input.wav` and save the output wave file to `samples/output_MossFormer2_SR_48K.wav`
|
35 |
+
|
36 |
+
```python
|
37 |
+
from clearvoice import ClearVoice
|
38 |
+
|
39 |
+
myClearVoice = ClearVoice(task='speech_super_resolution', model_names=['MossFormer2_SR_48K'])
|
40 |
+
|
41 |
+
output_wav = myClearVoice(input_path='samples/input.wav', online_write=False)
|
42 |
+
|
43 |
+
myClearVoice.write(output_wav, output_path='samples/output_MossFormer2_SR_48K.wav')
|
44 |
+
```
|
45 |
+
|
46 |
+
Sample example 2: use speech enhancement model `MossFormer2_SE_48K` to process all input wave files in `samples/path_to_input_wavs/` and save all output files to `samples/path_to_output_wavs`
|
47 |
+
|
48 |
+
```python
|
49 |
+
from clearvoice import ClearVoice
|
50 |
+
|
51 |
+
myClearVoice = ClearVoice(task='speech_super_resolution', model_names=['MossFormer2_SR_48K'])
|
52 |
+
|
53 |
+
myClearVoice(input_path='samples/path_to_input_wavs', online_write=True, output_path='samples/path_to_output_wavs')
|
54 |
+
```
|
55 |
+
|
56 |
+
Sample example 3: use speech enhancement model `MossFormer2_SE_48K` to process wave files listed in `samples/audio_samples.scp' file, and save all output files to 'samples/path_to_output_wavs_scp/'
|
57 |
+
|
58 |
+
```python
|
59 |
+
from clearvoice import ClearVoice
|
60 |
+
|
61 |
+
myClearVoice = ClearVoice(task='speech_super_resolution', model_names=['MossFormer2_SR_48K'])
|
62 |
+
|
63 |
+
myClearVoice(input_path='samples/scp/audio_samples.scp', online_write=True, output_path='samples/path_to_output_wavs_scp')
|
64 |
+
```
|
65 |
+
|
66 |
+
Model Limitations: The current speech super-resolution model is trained on a clean speech dataset and is designed to work with clean speech inputs. For speech super-resolution on noisy speech audio,
|
67 |
+
we recommend using our 'MossFormer2_SE_48K' model for speech enhancement first, followed by 'MossFormer2_SR_48K' for speech super-resolution.
|