alibabasglab commited on
Commit
b4906d4
·
verified ·
1 Parent(s): a58ea18

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +67 -3
README.md CHANGED
@@ -1,3 +1,67 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ # Introduction
5
+
6
+ The MossFormer2_SR_48K model weights for 48 kHz speech super-resolution in [ClearerVoice-Studio](https://github.com/modelscope/ClearerVoice-Studio/tree/main) repo.
7
+
8
+ This model is trained on large scale datasets inclduing open-sourced and private data.
9
+
10
+ The purpose is to enhance the quality of speech signals by increasing their temporal and spectral resolution, typically by converting low-resolution (low sampling rate)
11
+ audio to high-resolution (high sampling rate) audio. This involves reconstructing the high-frequency components that are often missing in low-resolution signals.
12
+
13
+ # Install
14
+
15
+ **Clone the Repository**
16
+
17
+ ``` sh
18
+ git clone https://github.com/modelscope/ClearerVoice-Studio.git
19
+ ```
20
+
21
+ **Create Conda Environment**
22
+
23
+ ``` sh
24
+ cd ClearerVoice-Studio
25
+ conda create -n clearvoice python=3.12.1
26
+ conda activate clearvoice
27
+ pip install -r requirements.txt
28
+ ```
29
+
30
+ **Run Script**
31
+
32
+ Go to `clearvoice/` and use the following examples. The MossFormer2_SR_48K model will be downloaded from huggingface automatically.
33
+
34
+ Sample example 1: use model `MossFormer2_SR_48K` to process one wave file of `samples/input.wav` and save the output wave file to `samples/output_MossFormer2_SR_48K.wav`
35
+
36
+ ```python
37
+ from clearvoice import ClearVoice
38
+
39
+ myClearVoice = ClearVoice(task='speech_super_resolution', model_names=['MossFormer2_SR_48K'])
40
+
41
+ output_wav = myClearVoice(input_path='samples/input.wav', online_write=False)
42
+
43
+ myClearVoice.write(output_wav, output_path='samples/output_MossFormer2_SR_48K.wav')
44
+ ```
45
+
46
+ Sample example 2: use speech enhancement model `MossFormer2_SE_48K` to process all input wave files in `samples/path_to_input_wavs/` and save all output files to `samples/path_to_output_wavs`
47
+
48
+ ```python
49
+ from clearvoice import ClearVoice
50
+
51
+ myClearVoice = ClearVoice(task='speech_super_resolution', model_names=['MossFormer2_SR_48K'])
52
+
53
+ myClearVoice(input_path='samples/path_to_input_wavs', online_write=True, output_path='samples/path_to_output_wavs')
54
+ ```
55
+
56
+ Sample example 3: use speech enhancement model `MossFormer2_SE_48K` to process wave files listed in `samples/audio_samples.scp' file, and save all output files to 'samples/path_to_output_wavs_scp/'
57
+
58
+ ```python
59
+ from clearvoice import ClearVoice
60
+
61
+ myClearVoice = ClearVoice(task='speech_super_resolution', model_names=['MossFormer2_SR_48K'])
62
+
63
+ myClearVoice(input_path='samples/scp/audio_samples.scp', online_write=True, output_path='samples/path_to_output_wavs_scp')
64
+ ```
65
+
66
+ Model Limitations: The current speech super-resolution model is trained on a clean speech dataset and is designed to work with clean speech inputs. For speech super-resolution on noisy speech audio,
67
+ we recommend using our 'MossFormer2_SE_48K' model for speech enhancement first, followed by 'MossFormer2_SR_48K' for speech super-resolution.