PhoenixStormJr commited on
Commit
04e7e27
ยท
verified ยท
1 Parent(s): dd385b0

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ docs/ๅฐ็™ฝ็ฎ€ๆ˜“ๆ•™็จ‹.doc filter=lfs diff=lfs merge=lfs -text
docs/README.en.md ADDED
@@ -0,0 +1,108 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div align="center">
2
+
3
+ <h1>Retrieval-based-Voice-Conversion-WebUI</h1>
4
+ An easy-to-use Voice Conversion framework based on VITS.<br><br>
5
+
6
+ [![madewithlove](https://forthebadge.com/images/badges/built-with-love.svg)](https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI)
7
+
8
+ <img src="https://counter.seku.su/cmoe?name=rvc&theme=r34" /><br>
9
+
10
+ [![Open In Colab](https://img.shields.io/badge/Colab-F9AB00?style=for-the-badge&logo=googlecolab&color=525252)](https://colab.research.google.com/github/liujing04/Retrieval-based-Voice-Conversion-WebUI/blob/main/Retrieval_based_Voice_Conversion_WebUI.ipynb)
11
+ [![Licence](https://img.shields.io/github/license/liujing04/Retrieval-based-Voice-Conversion-WebUI?style=for-the-badge)](https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI/blob/main/%E4%BD%BF%E7%94%A8%E9%9C%80%E9%81%B5%E5%AE%88%E7%9A%84%E5%8D%8F%E8%AE%AE-LICENSE.txt)
12
+ [![Huggingface](https://img.shields.io/badge/๐Ÿค—%20-Spaces-yellow.svg?style=for-the-badge)](https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main/)
13
+
14
+ [![Discord](https://img.shields.io/badge/RVC%20Developers-Discord-7289DA?style=for-the-badge&logo=discord&logoColor=white)](https://discord.gg/HcsmBBGyVk)
15
+
16
+ </div>
17
+
18
+ ------
19
+ [**Changelog**](https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI/blob/main/Changelog_CN.md) | [**FAQ (Frequently Asked Questions)**](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/wiki/FAQ-(Frequently-Asked-Questions))
20
+
21
+ [**English**](./README.en.md) | [**ไธญๆ–‡็ฎ€ไฝ“**](../README.md) | [**ๆ—ฅๆœฌ่ชž**](./README.ja.md) | [**ํ•œ๊ตญ์–ด**](./README.ko.md) ([**้Ÿ“ๅœ‹่ชž**](./README.ko.han.md))
22
+
23
+ :fire: A online demo using RVC that convert Vocal to Acoustic Guitar audio:fire: ๏ผšhttps://huggingface.co/spaces/lj1995/vocal2guitar
24
+
25
+ :fire: Vocal2Guitar demo video:fire: ๏ผšhttps://www.bilibili.com/video/BV19W4y1D7tT/
26
+
27
+ > Check our [Demo Video](https://www.bilibili.com/video/BV1pm4y1z7Gm/) here!
28
+
29
+ > Realtime Voice Conversion Software using RVC : [w-okada/voice-changer](https://github.com/w-okada/voice-changer)
30
+
31
+ > The dataset for the pre-training model uses nearly 50 hours of high quality VCTK open source dataset.
32
+
33
+ > High quality licensed song datasets will be added to training-set one after another for your use, without worrying about copyright infringement.
34
+ ## Summary
35
+ This repository has the following features:
36
+ + Reduce tone leakage by replacing source feature to training-set feature using top1 retrieval;
37
+ + Easy and fast training, even on relatively poor graphics cards;
38
+ + Training with a small amount of data also obtains relatively good results (>=10min low noise speech recommended);
39
+ + Supporting model fusion to change timbres (using ckpt processing tab->ckpt merge);
40
+ + Easy-to-use Webui interface;
41
+ + Use the UVR5 model to quickly separate vocals and instruments.
42
+ ## Preparing the environment
43
+ We recommend you install the dependencies through poetry.
44
+
45
+ The following commands need to be executed in the environment of Python version 3.8 or higher:
46
+ ```bash
47
+ # Install PyTorch-related core dependencies, skip if installed
48
+ # Reference: https://pytorch.org/get-started/locally/
49
+ pip install torch torchvision torchaudio
50
+
51
+ #For Windows + Nvidia Ampere Architecture(RTX30xx), you need to specify the cuda version corresponding to pytorch according to the experience of https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI/issues/21
52
+ #pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
53
+
54
+ # Install the Poetry dependency management tool, skip if installed
55
+ # Reference: https://python-poetry.org/docs/#installation
56
+ curl -sSL https://install.python-poetry.org | python3 -
57
+
58
+ # Install the project dependencies
59
+ poetry install
60
+ ```
61
+ You can also use pip to install the dependencies
62
+
63
+ ```bash
64
+ pip install -r requirements.txt
65
+ ```
66
+
67
+ ## Preparation of other Pre-models
68
+ RVC requires other pre-models to infer and train.
69
+
70
+ You need to download them from our [Huggingface space](https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main/).
71
+
72
+ Here's a list of Pre-models and other files that RVC needs:
73
+ ```bash
74
+ hubert_base.pt
75
+
76
+ ./pretrained
77
+
78
+ ./uvr5_weights
79
+
80
+ If you want to test the v2 version model (the v2 version model has changed the input from the 256 dimensional feature of 9-layer Hubert+final_proj to the 768 dimensional feature of 12-layer Hubert, and has added 3 period discriminators), you will need to download additional features
81
+
82
+ ./pretrained_v2
83
+
84
+ #If you are using Windows, you may also need this dictionary, skip if FFmpeg is installed
85
+ ffmpeg.exe
86
+ ```
87
+ Then use this command to start Webui:
88
+ ```bash
89
+ python infer-web.py
90
+ ```
91
+ If you are using Windows, you can download and extract `RVC-beta.7z` to use RVC directly and use `go-web.bat` to start Webui.
92
+
93
+ There's also a tutorial on RVC in Chinese and you can check it out if needed.
94
+
95
+ ## Credits
96
+ + [ContentVec](https://github.com/auspicious3000/contentvec/)
97
+ + [VITS](https://github.com/jaywalnut310/vits)
98
+ + [HIFIGAN](https://github.com/jik876/hifi-gan)
99
+ + [Gradio](https://github.com/gradio-app/gradio)
100
+ + [FFmpeg](https://github.com/FFmpeg/FFmpeg)
101
+ + [Ultimate Vocal Remover](https://github.com/Anjok07/ultimatevocalremovergui)
102
+ + [audio-slicer](https://github.com/openvpi/audio-slicer)
103
+ ## Thanks to all contributors for their efforts
104
+
105
+ <a href="https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI/graphs/contributors" target="_blank">
106
+ <img src="https://contrib.rocks/image?repo=liujing04/Retrieval-based-Voice-Conversion-WebUI" />
107
+ </a>
108
+
docs/README.ja.md ADDED
@@ -0,0 +1,104 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div align="center">
2
+
3
+ <h1>Retrieval-based-Voice-Conversion-WebUI</h1>
4
+ VITSใซๅŸบใฅใไฝฟใ„ใ‚„ใ™ใ„้Ÿณๅฃฐๅค‰ๆ›๏ผˆvoice changer๏ผ‰framework<br><br>
5
+
6
+ [![madewithlove](https://forthebadge.com/images/badges/built-with-love.svg)](https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI)
7
+
8
+ <img src="https://counter.seku.su/cmoe?name=rvc&theme=r34" /><br>
9
+
10
+ [![Open In Colab](https://img.shields.io/badge/Colab-F9AB00?style=for-the-badge&logo=googlecolab&color=525252)](https://colab.research.google.com/github/liujing04/Retrieval-based-Voice-Conversion-WebUI/blob/main/Retrieval_based_Voice_Conversion_WebUI.ipynb)
11
+ [![Licence](https://img.shields.io/github/license/liujing04/Retrieval-based-Voice-Conversion-WebUI?style=for-the-badge)](https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI/blob/main/%E4%BD%BF%E7%94%A8%E9%9C%80%E9%81%B5%E5%AE%88%E7%9A%84%E5%8D%8F%E8%AE%AE-LICENSE.txt)
12
+ [![Huggingface](https://img.shields.io/badge/๐Ÿค—%20-Spaces-yellow.svg?style=for-the-badge)](https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main/)
13
+
14
+ [![Discord](https://img.shields.io/badge/RVC%20Developers-Discord-7289DA?style=for-the-badge&logo=discord&logoColor=white)](https://discord.gg/HcsmBBGyVk)
15
+
16
+ </div>
17
+
18
+ ------
19
+
20
+ [**ๆ›ดๆ–ฐๆ—ฅ่ชŒ**](https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI/blob/main/Changelog_CN.md)
21
+
22
+ [**English**](./README.en.md) | [**ไธญๆ–‡็ฎ€ไฝ“**](../README.md) | [**ๆ—ฅๆœฌ่ชž**](./README.ja.md) | [**ํ•œ๊ตญ์–ด**](./README.ko.md) ([**้Ÿ“ๅœ‹่ชž**](./README.ko.han.md))
23
+
24
+ > ใƒ‡ใƒขๅ‹•็”ปใฏ[ใ“ใกใ‚‰](https://www.bilibili.com/video/BV1pm4y1z7Gm/)ใงใ”่ฆงใใ ใ•ใ„ใ€‚
25
+
26
+ > RVCใซใ‚ˆใ‚‹ใƒชใ‚ขใƒซใ‚ฟใ‚คใƒ ้Ÿณๅฃฐๅค‰ๆ›: [w-okada/voice-changer](https://github.com/w-okada/voice-changer)
27
+
28
+ > ่‘—ไฝœๆจฉไพตๅฎณใ‚’ๅฟƒ้…ใ™ใ‚‹ใ“ใจใชใไฝฟ็”จใงใใ‚‹ใ‚ˆใ†ใซใ€ๅŸบๅบ•ใƒขใƒ‡ใƒซใฏ็ด„50ๆ™‚้–“ใฎ้ซ˜ๅ“่ณชใชใ‚ชใƒผใƒ—ใƒณใ‚ฝใƒผใ‚นใƒ‡ใƒผใ‚ฟใ‚ปใƒƒใƒˆใง่จ“็ทดใ•ใ‚Œใฆใ„ใพใ™ใ€‚
29
+
30
+ > ไปŠๅพŒใ‚‚ใ€ๆฌกใ€…ใจไฝฟ็”จ่จฑๅฏใฎใ‚ใ‚‹้ซ˜ๅ“่ณชใชๆญŒๅฃฐใฎ่ณ‡ๆ–™้›†ใ‚’่ฟฝๅŠ ใ—ใ€ๅŸบๅบ•ใƒขใƒ‡ใƒซใ‚’่จ“็ทดใ™ใ‚‹ไบˆๅฎšใงใ™ใ€‚
31
+
32
+ ## ใฏใ˜ใ‚ใซ
33
+ ๆœฌใƒชใƒใ‚ธใƒˆใƒชใซใฏไธ‹่จ˜ใฎ็‰นๅพดใŒใ‚ใ‚Šใพใ™ใ€‚
34
+
35
+ + Top1ๆคœ็ดขใ‚’็”จใ„ใ‚‹ใ“ใจใงใ€็”Ÿใฎ็‰นๅพด้‡ใ‚’่จ“็ทด็”จใƒ‡ใƒผใ‚ฟใ‚ปใƒƒใƒˆ็‰นๅพด้‡ใซๅค‰ๆ›ใ—ใ€ใƒˆใƒผใƒณใƒชใƒผใ‚ฑใƒผใ‚ธใ‚’ๅ‰Šๆธ›ใ—ใพใ™ใ€‚
36
+ + ๆฏ”่ผƒ็š„่ฒงๅผฑใชGPUใงใ‚‚ใ€้ซ˜้€Ÿใ‹ใค็ฐกๅ˜ใซ่จ“็ทดใงใใพใ™ใ€‚
37
+ + ๅฐ‘้‡ใฎใƒ‡ใƒผใ‚ฟใ‚ปใƒƒใƒˆใ‹ใ‚‰ใงใ‚‚ใ€ๆฏ”่ผƒ็š„่‰ฏใ„็ตๆžœใ‚’ๅพ—ใ‚‹ใ“ใจใŒใงใใพใ™ใ€‚๏ผˆ10ๅˆ†ไปฅไธŠใฎใƒŽใ‚คใ‚บใฎๅฐ‘ใชใ„้Ÿณๅฃฐใ‚’ๆŽจๅฅจใ—ใพใ™ใ€‚๏ผ‰
38
+ + ใƒขใƒ‡ใƒซใ‚’่žๅˆใ™ใ‚‹ใ“ใจใงใ€้Ÿณๅฃฐใ‚’ๆททใœใ‚‹ใ“ใจใŒใงใใพใ™ใ€‚๏ผˆckpt processingใ‚ฟใƒ–ใฎใ€ckpt mergeใ‚’ไฝฟ็”จใ—ใพใ™ใ€‚๏ผ‰
39
+ + ไฝฟใ„ใ‚„ใ™ใ„WebUIใ€‚
40
+ + UVR5 Modelใ‚‚ๅซใ‚“ใงใ„ใ‚‹ใŸใ‚ใ€ไบบใฎๅฃฐใจBGMใ‚’็ด ๆ—ฉใๅˆ†้›ขใงใใพใ™ใ€‚
41
+
42
+ ## ็’ฐๅขƒๆง‹็ฏ‰
43
+ Poetryใงไพๅญ˜้–ขไฟ‚ใ‚’ใ‚คใƒณใ‚นใƒˆใƒผใƒซใ™ใ‚‹ใ“ใจใ‚’ใŠๅ‹งใ‚ใ—ใพใ™ใ€‚
44
+
45
+ ไธ‹่จ˜ใฎใ‚ณใƒžใƒณใƒ‰ใฏใ€Python3.8ไปฅไธŠใฎ็’ฐๅขƒใงๅฎŸ่กŒใ™ใ‚‹ๅฟ…่ฆใŒใ‚ใ‚Šใพใ™:
46
+ ```bash
47
+ # PyTorch้–ข้€ฃใฎไพๅญ˜้–ขไฟ‚ใ‚’ใ‚คใƒณใ‚นใƒˆใƒผใƒซใ€‚ใ‚คใƒณใ‚นใƒˆใƒผใƒซๆธˆใฎๅ ดๅˆใฏ็œ็•ฅใ€‚
48
+ # ๅ‚็…งๅ…ˆ: https://pytorch.org/get-started/locally/
49
+ pip install torch torchvision torchaudio
50
+
51
+ #Windows๏ผ‹ Nvidia Ampere Architecture(RTX30xx)ใฎๅ ดๅˆใ€ #21 ใซๅพ“ใ„ใ€pytorchใซๅฏพๅฟœใ™ใ‚‹cuda versionใ‚’ๆŒ‡ๅฎšใ™ใ‚‹ๅฟ…่ฆใŒใ‚ใ‚Šใพใ™ใ€‚
52
+ #pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
53
+
54
+ # PyTorch้–ข้€ฃใฎไพๅญ˜้–ขไฟ‚ใ‚’ใ‚คใƒณใ‚นใƒˆใƒผใƒซใ€‚ใ‚คใƒณใ‚นใƒˆใƒผใƒซๆธˆใฎๅ ดๅˆใฏ็œ็•ฅใ€‚
55
+ # ๅ‚็…งๅ…ˆ: https://python-poetry.org/docs/#installation
56
+ curl -sSL https://install.python-poetry.org | python3 -
57
+
58
+ # Poetry็ตŒ็”ฑใงไพๅญ˜้–ขไฟ‚ใ‚’ใ‚คใƒณใ‚นใƒˆใƒผใƒซ
59
+ poetry install
60
+ ```
61
+
62
+ pipใงใ‚‚ไพๅญ˜้–ขไฟ‚ใฎใ‚คใƒณใ‚นใƒˆใƒผใƒซใŒๅฏ่ƒฝใงใ™:
63
+
64
+ ```bash
65
+ pip install -r requirements.txt
66
+ ```
67
+
68
+ ## ๅŸบๅบ•modelsใ‚’ๆบ–ๅ‚™
69
+ RVCใฏๆŽจ่ซ–/่จ“็ทดใฎใŸใ‚ใซใ€ๆง˜ใ€…ใชไบ‹ๅ‰่จ“็ทดใ‚’่กŒใฃใŸๅŸบๅบ•ใƒขใƒ‡ใƒซใ‚’ๅฟ…่ฆใจใ—ใพใ™ใ€‚
70
+
71
+ modelsใฏ[Hugging Face space](https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main/)ใ‹ใ‚‰ใƒ€ใ‚ฆใƒณใƒญใƒผใƒ‰ใงใใพใ™ใ€‚
72
+
73
+ ไปฅไธ‹ใฏใ€RVCใซๅฟ…่ฆใชๅŸบๅบ•ใƒขใƒ‡ใƒซใ‚„ใใฎไป–ใฎใƒ•ใ‚กใ‚คใƒซใฎไธ€่ฆงใงใ™ใ€‚
74
+ ```bash
75
+ hubert_base.pt
76
+
77
+ ./pretrained
78
+
79
+ ./uvr5_weights
80
+
81
+ # ffmpegใŒใ™ใงใซinstallใ•ใ‚Œใฆใ„ใ‚‹ๅ ดๅˆใฏ็œ็•ฅ
82
+ ./ffmpeg
83
+ ```
84
+ ใใฎๅพŒใ€ไธ‹่จ˜ใฎใ‚ณใƒžใƒณใƒ‰ใงWebUIใ‚’่ตทๅ‹•ใ—ใพใ™ใ€‚
85
+ ```bash
86
+ python infer-web.py
87
+ ```
88
+ Windowsใ‚’ใŠไฝฟใ„ใฎๆ–นใฏใ€็›ดๆŽฅ`RVC-beta.7z`ใ‚’ใƒ€ใ‚ฆใƒณใƒญใƒผใƒ‰ๅพŒใซๅฑ•้–‹ใ—ใ€`go-web.bat`ใ‚’ใ‚ฏใƒชใƒƒใ‚ฏใ™ใ‚‹ใ“ใจใงใ€WebUIใ‚’่ตทๅ‹•ใ™ใ‚‹ใ“ใจใŒใงใใพใ™ใ€‚(7zipใŒๅฟ…่ฆใงใ™ใ€‚)
89
+
90
+ ใพใŸใ€ใƒชใƒใ‚ธใƒˆใƒชใซ[ๅฐ็™ฝ็ฎ€ๆ˜“ๆ•™็จ‹.doc](./ๅฐ็™ฝ็ฎ€ๆ˜“ๆ•™็จ‹.doc)ใŒใ‚ใ‚Šใพใ™ใฎใงใ€ๅ‚่€ƒใซใ—ใฆใใ ใ•ใ„๏ผˆไธญๅ›ฝ่ชž็‰ˆใฎใฟ๏ผ‰ใ€‚
91
+
92
+ ## ๅ‚่€ƒใƒ—ใƒญใ‚ธใ‚งใ‚ฏใƒˆ
93
+ + [ContentVec](https://github.com/auspicious3000/contentvec/)
94
+ + [VITS](https://github.com/jaywalnut310/vits)
95
+ + [HIFIGAN](https://github.com/jik876/hifi-gan)
96
+ + [Gradio](https://github.com/gradio-app/gradio)
97
+ + [FFmpeg](https://github.com/FFmpeg/FFmpeg)
98
+ + [Ultimate Vocal Remover](https://github.com/Anjok07/ultimatevocalremovergui)
99
+ + [audio-slicer](https://github.com/openvpi/audio-slicer)
100
+
101
+ ## ่ฒข็Œฎ่€…(contributor)ใฎ็š†ๆง˜ใฎๅฐฝๅŠ›ใซๆ„Ÿ่ฌใ—ใพใ™
102
+ <a href="https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI/graphs/contributors" target="_blank">
103
+ <img src="https://contrib.rocks/image?repo=liujing04/Retrieval-based-Voice-Conversion-WebUI" />
104
+ </a>
docs/README.ko.han.md ADDED
@@ -0,0 +1,100 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div align="center">
2
+
3
+ <h1>Retrieval-based-Voice-Conversion-WebUI</h1>
4
+ VITSๅŸบ็›ค์˜ ็ฐกๅ–ฎํ•˜๊ณ ไฝฟ็”จํ•˜๊ธฐ ์‰ฌ์šด้Ÿณ่ฒ่ฎŠๆ›ํ‹€<br><br>
5
+
6
+ [![madewithlove](https://forthebadge.com/images/badges/built-with-love.svg)](https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI)
7
+
8
+ <img src="https://counter.seku.su/cmoe?name=rvc&theme=r34" /><br>
9
+
10
+ [![Open In Colab](https://img.shields.io/badge/Colab-F9AB00?style=for-the-badge&logo=googlecolab&color=525252)](https://colab.research.google.com/github/liujing04/Retrieval-based-Voice-Conversion-WebUI/blob/main/Retrieval_based_Voice_Conversion_WebUI.ipynb)
11
+ [![Licence](https://img.shields.io/github/license/liujing04/Retrieval-based-Voice-Conversion-WebUI?style=for-the-badge)](https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI/blob/main/%E4%BD%BF%E7%94%A8%E9%9C%80%E9%81%B5%E5%AE%88%E7%9A%84%E5%8D%8F%E8%AE%AE-LICENSE.txt)
12
+ [![Huggingface](https://img.shields.io/badge/๐Ÿค—%20-Spaces-yellow.svg?style=for-the-badge)](https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main/)
13
+
14
+ [![Discord](https://img.shields.io/badge/RVC%20Developers-Discord-7289DA?style=for-the-badge&logo=discord&logoColor=white)](https://discord.gg/HcsmBBGyVk)
15
+
16
+ </div>
17
+
18
+ ------
19
+ [**ๆ›ดๆ–ฐๆ—ฅ่ชŒ**](https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI/blob/main/Changelog_CN.md)
20
+
21
+ [**English**](./README.en.md) | [**ไธญๆ–‡็ฎ€ไฝ“**](../README.md) | [**ๆ—ฅๆœฌ่ชž**](./README.ja.md) | [**ํ•œ๊ตญ์–ด**](./README.ko.md) ([**้Ÿ“ๅœ‹่ชž**](./README.ko.han.md))
22
+
23
+ > [็คบ็ฏ„ๆ˜ ๅƒ](https://www.bilibili.com/video/BV1pm4y1z7Gm/)์„ ็ขบ่ชํ•ด ๋ณด์„ธ์š”!
24
+
25
+ > RVC๋ฅผๆดป็”จํ•œๅฏฆๆ™‚้–“้Ÿณ่ฒ่ฎŠๆ›: [w-okada/voice-changer](https://github.com/w-okada/voice-changer)
26
+
27
+ > ๅŸบๆœฌ๋ชจ๋ธ์€ 50ๆ™‚้–“ๅ‡้‡์˜ ้ซ˜ๅ“่ณช ์˜คํ”ˆ ์†Œ์Šค VCTK ๋ฐ์ดํ„ฐ์…‹์„ ไฝฟ็”จํ•˜์˜€์œผ๋ฏ€๋กœ, ่‘—ไฝœๆฌŠไธŠ์˜ ๅฟตๆ…ฎ๊ฐ€ ์—†์œผ๋‹ˆ ๅฎ‰ๅฟƒํ•˜๊ณ  ไฝฟ็”จํ•˜์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค.
28
+
29
+ > ่‘—ไฝœๆฌŠๅ•้กŒ๊ฐ€ ์—†๋Š” ้ซ˜ๅ“่ณช์˜ ๋…ธ๋ž˜๋ฅผ ไปฅๅพŒ์—๋„ ็นผ็บŒํ•ด์„œ ่จ“็ทดํ•  ่ฑซๅฎš์ž…๋‹ˆ๋‹ค.
30
+
31
+ ## ็ดนไป‹
32
+ ๆœฌRepo๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ็‰นๅพต์„ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค:
33
+ + top1ๆชข็ดข์„ๅˆฉ็”จํ•˜์—ฌ ๅ…ฅๅŠ›้Ÿณ่‰ฒ็‰นๅพต์„ ่จ“็ทด์„ธํŠธ้Ÿณ่‰ฒ็‰นๅพต์œผ๋กœ ไปฃๆ›ฟํ•˜์—ฌ ้Ÿณ่‰ฒ์˜ๆผๅ‡บ์„ ้˜ฒๆญข;
34
+ + ็›ธๅฐ็š„์œผ๋กœ ๋‚ฎ์€ๆ€ง่ƒฝ์˜ GPU์—์„œ๋„ ๋น ๋ฅธ่จ“็ทดๅฏ่ƒฝ;
35
+ + ์ ์€้‡์˜ ๋ฐ์ดํ„ฐ๋กœ ่จ“็ทดํ•ด๋„ ์ข‹์€ ็ตๆžœ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์Œ (ๆœ€ๅฐ10ๅˆ†ไปฅไธŠ์˜ ไฝŽ้›œ์Œ้Ÿณ่ฒ๋ฐ์ดํ„ฐ๋ฅผ ไฝฟ็”จํ•˜๋Š” ๊ฒƒ์„ ๅ‹ธ็Ž);
36
+ + ๋ชจ๋ธ่žๅˆ์„้€šํ•œ ้Ÿณ่‰ฒ์˜ ่ฎŠ่ชฟๅฏ่ƒฝ (ckpt่™•็†ํƒญ->ckptๆททๅˆ้ธๆ“‡);
37
+ + ไฝฟ็”จํ•˜๊ธฐ ์‰ฌ์šด WebUI (์›น ไฝฟ็”จ่€…์ธํ„ฐํŽ˜์ด์Šค);
38
+ + UVR5 ๋ชจ๋ธ์„ ๅˆฉ็”จํ•˜์—ฌ ๋ชฉ์†Œ๋ฆฌ์™€ ่ƒŒๆ™ฏ้Ÿณๆจ‚์˜ ๋น ๋ฅธ ๅˆ†้›ข;
39
+
40
+ ## ็’ฐๅขƒ์˜ๆบ–ๅ‚™
41
+ poetry๋ฅผ้€šํ•ด ไพๅญ˜๋ฅผ่จญ็ฝฎํ•˜๋Š” ๊ฒƒ์„ ๅ‹ธ็Žํ•ฉ๋‹ˆ๋‹ค.
42
+
43
+ ๋‹ค์Œๅ‘ฝไปค์€ Python ๋ฒ„์ „3.8ไปฅไธŠ์˜็’ฐๅขƒ์—์„œ ๅฏฆ่กŒ๋˜์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค:
44
+ ```bash
45
+ # PyTorch ้—œ่ฏไธป่ฆไพๅญ˜่จญ็ฝฎ, ์ด๋ฏธ่จญ็ฝฎ๋˜์–ด ์žˆ๋Š” ๅขƒ้‡ ๊ฑด๋„ˆ๋›ฐ๊ธฐ ๅฏ่ƒฝ
46
+ # ๅƒ็…ง: https://pytorch.org/get-started/locally/
47
+ pip install torch torchvision torchaudio
48
+
49
+ # Windows + Nvidia Ampere Architecture(RTX30xx)๋ฅผ ไฝฟ็”จํ•˜๊ณ  ์žˆ๋‹ค้ข, #21 ์—์„œ ๋ช…์‹œ๋œ ๊ฒƒ๊ณผ ๊ฐ™์ด PyTorch์— ๋งž๋Š” CUDA ๋ฒ„์ „์„ ๆŒ‡ๅฎšํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
50
+ #pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
51
+
52
+ # Poetry ่จญ็ฝฎ, ์ด๋ฏธ่จญ็ฝฎ๋˜์–ด ์žˆ๋Š” ๅขƒ้‡ ๊ฑด๋„ˆ๋›ฐ๊ธฐ ๅฏ่ƒฝ
53
+ # Reference: https://python-poetry.org/docs/#installation
54
+ curl -sSL https://install.python-poetry.org | python3 -
55
+
56
+ # ไพๅญ˜่จญ็ฝฎ
57
+ poetry install
58
+ ```
59
+ pip๋ฅผ ๆดป็”จํ•˜์—ฌไพๅญ˜๋ฅผ ่จญ็ฝฎํ•˜์—ฌ๋„ ็„กๅฆจํ•ฉ๋‹ˆ๋‹ค.
60
+
61
+ ```bash
62
+ pip install -r requirements.txt
63
+ ```
64
+
65
+ ## ๅ…ถไป–้ ๅ‚™๋ชจ๋ธๆบ–ๅ‚™
66
+ RVC ๋ชจ๋ธ์€ ๆŽจ่ซ–๊ณผ่จ“็ทด์„ ไพํ•˜์—ฌ ๋‹ค๋ฅธ ้ ๅ‚™๋ชจ๋ธ์ด ๅฟ…่ฆํ•ฉ๋‹ˆ๋‹ค.
67
+
68
+ [Huggingface space](https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main/)๋ฅผ ้€šํ•ด์„œ ๋‹ค์šด๋กœ๋“œ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
69
+
70
+ ๋‹ค์Œ์€ RVC์— ๅฟ…่ฆํ•œ ้ ๅ‚™๋ชจ๋ธ ๋ฐ ๅ…ถไป– ํŒŒ์ผ ็›ฎ้Œ„์ž…๋‹ˆ๋‹ค:
71
+ ```bash
72
+ hubert_base.pt
73
+
74
+ ./pretrained
75
+
76
+ ./uvr5_weights
77
+
78
+ # Windows๋ฅผ ไฝฟ็”จํ•˜๋Š”ๅขƒ้‡ ์ด ์‚ฌ์ „๋„ ๅฟ…่ฆํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. FFmpeg๊ฐ€ ่จญ็ฝฎ๋˜์–ด ์žˆ์œผ๋ฉด ๊ฑด๋„ˆ๋›ฐ์–ด๋„ ๋ฉ๋‹ˆ๋‹ค.
79
+ ffmpeg.exe
80
+ ```
81
+ ๊ทธๅพŒ ไปฅไธ‹์˜ ๅ‘ฝไปค์„ ไฝฟ็”จํ•˜์—ฌ WebUI๋ฅผ ๅง‹ไฝœํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
82
+ ```bash
83
+ python infer-web.py
84
+ ```
85
+ Windows๋ฅผ ไฝฟ็”จํ•˜๋Š”ๅขƒ้‡ `RVC-beta.7z`๋ฅผ ๋‹ค์šด๋กœ๋“œ ๋ฐ ๅฃ“็ธฎ่งฃ้™คํ•˜์—ฌ RVC๋ฅผ ็›ดๆŽฅไฝฟ็”จํ•˜๊ฑฐ๋‚˜ `go-web.bat`์„ ไฝฟ็”จํ•˜์—ฌ WebUi๋ฅผ ็›ดๆŽฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
86
+
87
+ ## ๅƒ่€ƒ
88
+ + [ContentVec](https://github.com/auspicious3000/contentvec/)
89
+ + [VITS](https://github.com/jaywalnut310/vits)
90
+ + [HIFIGAN](https://github.com/jik876/hifi-gan)
91
+ + [Gradio](https://github.com/gradio-app/gradio)
92
+ + [FFmpeg](https://github.com/FFmpeg/FFmpeg)
93
+ + [Ultimate Vocal Remover](https://github.com/Anjok07/ultimatevocalremovergui)
94
+ + [audio-slicer](https://github.com/openvpi/audio-slicer)
95
+ ## ๋ชจ๋“ ๅฏ„่ˆ‡่€…๋ถ„๋“ค์˜ๅ‹žๅŠ›์—ๆ„Ÿ่ฌ๋“œ๋ฆฝ๋‹ˆ๋‹ค
96
+
97
+ <a href="https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI/graphs/contributors" target="_blank">
98
+ <img src="https://contrib.rocks/image?repo=liujing04/Retrieval-based-Voice-Conversion-WebUI" />
99
+ </a>
100
+
docs/README.ko.md ADDED
@@ -0,0 +1,112 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div align="center">
2
+
3
+ <h1>Retrieval-based-Voice-Conversion-WebUI</h1>
4
+ VITS ๊ธฐ๋ฐ˜์˜ ๊ฐ„๋‹จํ•˜๊ณ  ์‚ฌ์šฉํ•˜๊ธฐ ์‰ฌ์šด ์Œ์„ฑ ๋ณ€ํ™˜ ํ”„๋ ˆ์ž„์›Œํฌ.<br><br>
5
+
6
+ [![madewithlove](https://forthebadge.com/images/badges/built-with-love.svg)](https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI)
7
+
8
+ <img src="https://counter.seku.su/cmoe?name=rvc&theme=r34" /><br>
9
+
10
+ [![Open In Colab](https://img.shields.io/badge/Colab-F9AB00?style=for-the-badge&logo=googlecolab&color=525252)](https://colab.research.google.com/github/liujing04/Retrieval-based-Voice-Conversion-WebUI/blob/main/Retrieval_based_Voice_Conversion_WebUI.ipynb)
11
+ [![Licence](https://img.shields.io/github/license/liujing04/Retrieval-based-Voice-Conversion-WebUI?style=for-the-badge)](https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI/blob/main/%E4%BD%BF%E7%94%A8%E9%9C%80%E9%81%B5%E5%AE%88%E7%9A%84%E5%8D%8F%E8%AE%AE-LICENSE.txt)
12
+ [![Huggingface](https://img.shields.io/badge/๐Ÿค—%20-Spaces-yellow.svg?style=for-the-badge)](https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main/)
13
+
14
+ [![Discord](https://img.shields.io/badge/RVC%20Developers-Discord-7289DA?style=for-the-badge&logo=discord&logoColor=white)](https://discord.gg/HcsmBBGyVk)
15
+
16
+ </div>
17
+
18
+ ---
19
+
20
+ [**์—…๋ฐ์ดํŠธ ๋กœ๊ทธ**](https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI/blob/main/Changelog_KO.md)
21
+
22
+ [**English**](./README.en.md) | [**ไธญๆ–‡็ฎ€ไฝ“**](../README.md) | [**ๆ—ฅๆœฌ่ชž**](./README.ja.md) | [**ํ•œ๊ตญ์–ด**](./README.ko.md) ([**้Ÿ“ๅœ‹่ชž**](./README.ko.han.md))
23
+
24
+ > [๋ฐ๋ชจ ์˜์ƒ](https://www.bilibili.com/video/BV1pm4y1z7Gm/)์„ ํ™•์ธํ•ด ๋ณด์„ธ์š”!
25
+
26
+ > RVC๋ฅผ ํ™œ์šฉํ•œ ์‹ค์‹œ๊ฐ„ ์Œ์„ฑ๋ณ€ํ™˜: [w-okada/voice-changer](https://github.com/w-okada/voice-changer)
27
+
28
+ > ๊ธฐ๋ณธ ๋ชจ๋ธ์€ 50์‹œ๊ฐ„ ๊ฐ€๋Ÿ‰์˜ ๊ณ ํ€„๋ฆฌํ‹ฐ ์˜คํ”ˆ ์†Œ์Šค VCTK ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•˜์˜€์œผ๋ฏ€๋กœ, ์ €์ž‘๊ถŒ์ƒ์˜ ์—ผ๋ ค๊ฐ€ ์—†์œผ๋‹ˆ ์•ˆ์‹ฌํ•˜๊ณ  ์‚ฌ์šฉํ•˜์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค.
29
+
30
+ > ์ €์ž‘๊ถŒ ๋ฌธ์ œ๊ฐ€ ์—†๋Š” ๊ณ ํ€„๋ฆฌํ‹ฐ์˜ ๋…ธ๋ž˜๋ฅผ ์ดํ›„์—๋„ ๊ณ„์†ํ•ด์„œ ํ›ˆ๋ จํ•  ์˜ˆ์ •์ž…๋‹ˆ๋‹ค.
31
+
32
+ ## ์†Œ๊ฐœ
33
+
34
+ ๋ณธ Repo๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํŠน์ง•์„ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค:
35
+
36
+ - top1 ๊ฒ€์ƒ‰์„ ์ด์šฉํ•˜์—ฌ ์ž…๋ ฅ ์Œ์ƒ‰ ํŠน์ง•์„ ํ›ˆ๋ จ ์„ธํŠธ ์Œ์ƒ‰ ํŠน์ง•์œผ๋กœ ๋Œ€์ฒดํ•˜์—ฌ ์Œ์ƒ‰์˜ ๋ˆ„์ถœ์„ ๋ฐฉ์ง€;
37
+ - ์ƒ๋Œ€์ ์œผ๋กœ ๋‚ฎ์€ ์„ฑ๋Šฅ์˜ GPU์—์„œ๋„ ๋น ๋ฅธ ํ›ˆ๋ จ ๊ฐ€๋Šฅ;
38
+ - ์ ์€ ์–‘์˜ ๋ฐ์ดํ„ฐ๋กœ ํ›ˆ๋ จํ•ด๋„ ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์Œ (์ตœ์†Œ 10๋ถ„ ์ด์ƒ์˜ ์ €์žก์Œ ์Œ์„ฑ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์„ ๊ถŒ์žฅ);
39
+ - ๋ชจ๋ธ ์œตํ•ฉ์„ ํ†ตํ•œ ์Œ์ƒ‰์˜ ๋ณ€์กฐ ๊ฐ€๋Šฅ (ckpt ์ฒ˜๋ฆฌ ํƒญ->ckpt ๋ณ‘ํ•ฉ ์„ ํƒ);
40
+ - ์‚ฌ์šฉํ•˜๊ธฐ ์‰ฌ์šด WebUI (์›น ์ธํ„ฐํŽ˜์ด์Šค);
41
+ - UVR5 ๋ชจ๋ธ์„ ์ด์šฉํ•˜์—ฌ ๋ชฉ์†Œ๋ฆฌ์™€ ๋ฐฐ๊ฒฝ์Œ์•…์˜ ๋น ๋ฅธ ๋ถ„๋ฆฌ;
42
+
43
+ ## ํ™˜๊ฒฝ์˜ ์ค€๋น„
44
+
45
+ poetry๋ฅผ ํ†ตํ•ด dependecies๋ฅผ ์„ค์น˜ํ•˜๋Š” ๊ฒƒ์„ ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค.
46
+
47
+ ๋‹ค์Œ ๋ช…๋ น์€ Python ๋ฒ„์ „ 3.8 ์ด์ƒ์˜ ํ™˜๊ฒฝ์—์„œ ์‹คํ–‰๋˜์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค:
48
+
49
+ ```bash
50
+ # PyTorch ๊ด€๋ จ ์ฃผ์š” dependencies ์„ค์น˜, ์ด๋ฏธ ์„ค์น˜๋˜์–ด ์žˆ๋Š” ๊ฒฝ์šฐ ๊ฑด๋„ˆ๋›ฐ๊ธฐ ๊ฐ€๋Šฅ
51
+ # ์ฐธ์กฐ: https://pytorch.org/get-started/locally/
52
+ pip install torch torchvision torchaudio
53
+
54
+ # Windows + Nvidia Ampere Architecture(RTX30xx)๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๋‹ค๋ฉด, https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI/issues/21 ์—์„œ ๋ช…์‹œ๋œ ๊ฒƒ๊ณผ ๊ฐ™์ด PyTorch์— ๋งž๋Š” CUDA ๋ฒ„์ „์„ ์ง€์ •ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
55
+ #pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
56
+
57
+ # Poetry ์„ค์น˜, ์ด๋ฏธ ์„ค์น˜๋˜์–ด ์žˆ๋Š” ๊ฒฝ์šฐ ๊ฑด๋„ˆ๋›ฐ๊ธฐ ๊ฐ€๋Šฅ
58
+ # Reference: https://python-poetry.org/docs/#installation
59
+ curl -sSL https://install.python-poetry.org | python3 -
60
+
61
+ # Dependecies ์„ค์น˜
62
+ poetry install
63
+ ```
64
+
65
+ pip๋ฅผ ํ™œ์šฉํ•˜์—ฌ dependencies๋ฅผ ์„ค์น˜ํ•˜์—ฌ๋„ ๋ฌด๋ฐฉํ•ฉ๋‹ˆ๋‹ค.
66
+
67
+ ```bash
68
+ pip install -r requirements.txt
69
+ ```
70
+
71
+ ## ๊ธฐํƒ€ ์‚ฌ์ „ ๋ชจ๋ธ ์ค€๋น„
72
+
73
+ RVC ๋ชจ๋ธ์€ ์ถ”๋ก ๊ณผ ํ›ˆ๋ จ์„ ์œ„ํ•˜์—ฌ ๋‹ค๋ฅธ ์‚ฌ์ „ ๋ชจ๋ธ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
74
+
75
+ [Huggingface space](https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main/)๋ฅผ ํ†ตํ•ด์„œ ๋‹ค์šด๋กœ๋“œ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
76
+
77
+ ๋‹ค์Œ์€ RVC์— ํ•„์š”ํ•œ ์‚ฌ์ „ ๋ชจ๋ธ ๋ฐ ๊ธฐํƒ€ ํŒŒ์ผ ๋ชฉ๋ก์ž…๋‹ˆ๋‹ค:
78
+
79
+ ```bash
80
+ hubert_base.pt
81
+
82
+ ./pretrained
83
+
84
+ ./uvr5_weights
85
+
86
+ # Windows๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ ์ด ์‚ฌ์ „๋„ ํ•„์š”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. FFmpeg๊ฐ€ ์„ค์น˜๋˜์–ด ์žˆ์œผ๋ฉด ๊ฑด๋„ˆ๋›ฐ์–ด๋„ ๋ฉ๋‹ˆ๋‹ค.
87
+ ffmpeg.exe
88
+ ```
89
+
90
+ ๊ทธ ํ›„ ์ดํ•˜์˜ ๋ช…๋ น์„ ์‚ฌ์šฉํ•˜์—ฌ WebUI๋ฅผ ์‹œ์ž‘ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
91
+
92
+ ```bash
93
+ python infer-web.py
94
+ ```
95
+
96
+ Windows๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ `RVC-beta.7z`๋ฅผ ๋‹ค์šด๋กœ๋“œ ๋ฐ ์••์ถ• ํ•ด์ œํ•˜์—ฌ RVC๋ฅผ ์ง์ ‘ ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜ `go-web.bat`์„ ์‚ฌ์šฉํ•˜์—ฌ WebUi๋ฅผ ์‹œ์ž‘ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
97
+
98
+ ## ์ฐธ๊ณ 
99
+
100
+ - [ContentVec](https://github.com/auspicious3000/contentvec/)
101
+ - [VITS](https://github.com/jaywalnut310/vits)
102
+ - [HIFIGAN](https://github.com/jik876/hifi-gan)
103
+ - [Gradio](https://github.com/gradio-app/gradio)
104
+ - [FFmpeg](https://github.com/FFmpeg/FFmpeg)
105
+ - [Ultimate Vocal Remover](https://github.com/Anjok07/ultimatevocalremovergui)
106
+ - [audio-slicer](https://github.com/openvpi/audio-slicer)
107
+
108
+ ## ๋ชจ๋“  ๊ธฐ์—ฌ์ž ๋ถ„๋“ค์˜ ๋…ธ๋ ฅ์— ๊ฐ์‚ฌ๋“œ๋ฆฝ๋‹ˆ๋‹ค.
109
+
110
+ <a href="https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI/graphs/contributors" target="_blank">
111
+ <img src="https://contrib.rocks/image?repo=liujing04/Retrieval-based-Voice-Conversion-WebUI" />
112
+ </a>
docs/faiss_tips_en.md ADDED
@@ -0,0 +1,102 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ faiss tuning TIPS
2
+ ==================
3
+ # about faiss
4
+ faiss is a library of neighborhood searches for dense vectors, developed by facebook research, which efficiently implements many approximate neighborhood search methods.
5
+ Approximate Neighbor Search finds similar vectors quickly while sacrificing some accuracy.
6
+
7
+ ## faiss in RVC
8
+ In RVC, for the embedding of features converted by HuBERT, we search for embeddings similar to the embedding generated from the training data and mix them to achieve a conversion that is closer to the original speech. However, since this search takes time if performed naively, high-speed conversion is realized by using approximate neighborhood search.
9
+
10
+ # implementation overview
11
+ In '/logs/your-experiment/3_feature256' where the model is located, features extracted by HuBERT from each voice data are located.
12
+ From here we read the npy files in order sorted by filename and concatenate the vectors to create big_npy. (This vector has shape [N, 256].)
13
+ After saving big_npy as /logs/your-experiment/total_fea.npy, train it with faiss.
14
+
15
+ In this article, I will explain the meaning of these parameters.
16
+
17
+ # Explanation of the method
18
+ ## index factory
19
+ An index factory is a unique faiss notation that expresses a pipeline that connects multiple approximate neighborhood search methods as a string.
20
+ This allows you to try various approximate neighborhood search methods simply by changing the index factory string.
21
+ In RVC it is used like this:
22
+
23
+ ```python
24
+ index = faiss.index_factory(256, "IVF%s,Flat" % n_ivf)
25
+ ```
26
+ Among the arguments of index_factory, the first is the number of dimensions of the vector, the second is the index factory string, and the third is the distance to use.
27
+
28
+ For more detailed notation
29
+ https://github.com/facebookresearch/faiss/wiki/The-index-factory
30
+
31
+ ## index for distance
32
+ There are two typical indexes used as similarity of embedding as follows.
33
+
34
+ - Euclidean distance (METRIC_L2)
35
+ - inner product (METRIC_INNER_PRODUCT)
36
+
37
+ Euclidean distance takes the squared difference in each dimension, sums the differences in all dimensions, and then takes the square root. This is the same as the distance in 2D and 3D that we use on a daily basis.
38
+ The inner product is not used as an index of similarity as it is, and the cosine similarity that takes the inner product after being normalized by the L2 norm is generally used.
39
+
40
+ Which is better depends on the case, but cosine similarity is often used in embedding obtained by word2vec and similar image retrieval models learned by ArcFace. If you want to do l2 normalization on vector X with numpy, you can do it with the following code with eps small enough to avoid 0 division.
41
+
42
+ ```python
43
+ X_normed = X / np.maximum(eps, np.linalg.norm(X, ord=2, axis=-1, keepdims=True))
44
+ ```
45
+
46
+ Also, for the index factory, you can change the distance index used for calculation by choosing the value to pass as the third argument.
47
+
48
+ ```python
49
+ index = faiss.index_factory(dimention, text, faiss.METRIC_INNER_PRODUCT)
50
+ ```
51
+
52
+ ## IVF
53
+ IVF (Inverted file indexes) is an algorithm similar to the inverted index in full-text search.
54
+ During learning, the search target is clustered with kmeans, and Voronoi partitioning is performed using the cluster center. Each data point is assigned a cluster, so we create a dictionary that looks up the data points from the clusters.
55
+
56
+ For example, if clusters are assigned as follows
57
+ |index|Cluster|
58
+ |-----|-------|
59
+ |1|A|
60
+ |2|B|
61
+ |3|A|
62
+ |4|C|
63
+ |5|B|
64
+
65
+ The resulting inverted index looks like this:
66
+
67
+ |cluster|index|
68
+ |-------|-----|
69
+ |A|1, 3|
70
+ |B|2, 5|
71
+ |C|4|
72
+
73
+ When searching, we first search n_probe clusters from the clusters, and then calculate the distances for the data points belonging to each cluster.
74
+
75
+ # recommend parameter
76
+ There are official guidelines on how to choose an index, so I will explain accordingly.
77
+ https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index
78
+
79
+ For datasets below 1M, 4bit-PQ is the most efficient method available in faiss as of April 2023.
80
+ Combining this with IVF, narrowing down the candidates with 4bit-PQ, and finally recalculating the distance with an accurate index can be described by using the following index factory.
81
+
82
+ ```python
83
+ index = faiss.index_factory(256, "IVF1024,PQ128x4fs,RFlat")
84
+ ```
85
+
86
+ ## Recommended parameters for IVF
87
+ Consider the case of too many IVFs. For example, if coarse quantization by IVF is performed for the number of data, this is the same as a naive exhaustive search and is inefficient.
88
+ For 1M or less, IVF values are recommended between 4*sqrt(N) ~ 16*sqrt(N) for N number of data points.
89
+
90
+ Since the calculation time increases in proportion to the number of n_probes, please consult with the accuracy and choose appropriately. Personally, I don't think RVC needs that much accuracy, so n_probe = 1 is fine.
91
+
92
+ ## FastScan
93
+ FastScan is a method that enables high-speed approximation of distances by Cartesian product quantization by performing them in registers.
94
+ Cartesian product quantization performs clustering independently for each d dimension (usually d = 2) during learning, calculates the distance between clusters in advance, and creates a lookup table. At the time of prediction, the distance of each dimension can be calculated in O(1) by looking at the lookup table.
95
+ So the number you specify after PQ usually specifies half the dimension of the vector.
96
+
97
+ For a more detailed description of FastScan, please refer to the official documentation.
98
+ https://github.com/facebookresearch/faiss/wiki/Fast-accumulation-of-PQ-and-AQ-codes-(FastScan)
99
+
100
+ ## RFlat
101
+ RFlat is an instruction to recalculate the rough distance calculated by FastScan with the exact distance specified by the third argument of index factory.
102
+ When getting k neighbors, k*k_factor points are recalculated.
docs/faiss_tips_ja.md ADDED
@@ -0,0 +1,101 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ faiss tuning TIPS
2
+ ==================
3
+ # about faiss
4
+ faissใฏfacebook researchใฎ้–‹็™บใ™ใ‚‹ใ€ๅฏ†ใชใƒ™ใ‚ฏใƒˆใƒซใซๅฏพใ™ใ‚‹่ฟ‘ๅ‚ๆŽข็ดขใ‚’ใพใจใ‚ใŸใƒฉใ‚คใƒ–ใƒฉใƒชใงใ€ๅคšใใฎ่ฟ‘ไผผ่ฟ‘ๅ‚ๆŽข็ดขใฎๆ‰‹ๆณ•ใ‚’ๅŠน็އ็š„ใซๅฎŸ่ฃ…ใ—ใฆใ„ใพใ™ใ€‚
5
+ ่ฟ‘ไผผ่ฟ‘ๅ‚ๆŽข็ดขใฏใ‚ใ‚‹็จ‹ๅบฆ็ฒพๅบฆใ‚’็Š ็‰ฒใซใ—ใชใŒใ‚‰้ซ˜้€Ÿใซ้กžไผผใ™ใ‚‹ใƒ™ใ‚ฏใƒˆใƒซใ‚’ๆŽขใ—ใพใ™ใ€‚
6
+
7
+ ## faiss in RVC
8
+ RVCใงใฏHuBERTใงๅค‰ๆ›ใ—ใŸ็‰นๅพด้‡ใฎEmbeddingใซๅฏพใ—ใ€ๅญฆ็ฟ’ใƒ‡ใƒผใ‚ฟใ‹ใ‚‰็”Ÿๆˆใ•ใ‚ŒใŸEmbeddingใจ้กžไผผใ™ใ‚‹ใ‚‚ใฎใ‚’ๆคœ็ดขใ—ใ€ๆททใœใ‚‹ใ“ใจใงใ‚ˆใ‚Šๅ…ƒใฎ้Ÿณๅฃฐใซ่ฟ‘ใ„ๅค‰ๆ›ใ‚’ๅฎŸ็พใ—ใฆใ„ใพใ™ใ€‚ใŸใ ใ€ใ“ใฎๆคœ็ดขใฏๆ„š็›ดใซ่กŒใ†ใจๆ™‚้–“ใŒใ‹ใ‹ใ‚‹ใŸใ‚ใ€่ฟ‘ไผผ่ฟ‘ๅ‚ๆŽข็ดขใ‚’็”จใ„ใ‚‹ใ“ใจใง้ซ˜้€Ÿใชๅค‰ๆ›ใ‚’ๅฎŸ็พใ—ใฆใ„ใพใ™ใ€‚
9
+
10
+ # ๅฎŸ่ฃ…ใฎoverview
11
+ ใƒขใƒ‡ใƒซใŒ้…็ฝฎใ•ใ‚Œใฆใ„ใ‚‹ '/logs/your-experiment/3_feature256'ใซใฏๅ„้Ÿณๅฃฐใƒ‡ใƒผใ‚ฟใ‹ใ‚‰HuBERTใงๆŠฝๅ‡บใ•ใ‚ŒใŸ็‰นๅพด้‡ใŒ้…็ฝฎใ•ใ‚Œใฆใ„ใพใ™ใ€‚
12
+ ใ“ใ“ใ‹ใ‚‰npyใƒ•ใ‚กใ‚คใƒซใ‚’ใƒ•ใ‚กใ‚คใƒซๅใงใ‚ฝใƒผใƒˆใ—ใŸ้ †็•ชใง่ชญใฟ่พผใฟใ€ใƒ™ใ‚ฏใƒˆใƒซใ‚’้€ฃ็ตใ—ใฆbig_npyใ‚’ไฝœๆˆใ—faissใ‚’ๅญฆ็ฟ’ใ•ใ›ใพใ™ใ€‚(ใ“ใฎใƒ™ใ‚ฏใƒˆใƒซใฎshapeใฏ[N, 256]ใงใ™ใ€‚)
13
+
14
+ ๆœฌTipsใงใฏใพใšใ“ใ‚Œใ‚‰ใฎใƒ‘ใƒฉใƒกใƒผใ‚ฟใฎๆ„ๅ‘ณใ‚’่งฃ่ชฌใ—ใพใ™ใ€‚
15
+
16
+ # ๆ‰‹ๆณ•ใฎ่งฃ่ชฌ
17
+ ## index factory
18
+ index factoryใฏ่ค‡ๆ•ฐใฎ่ฟ‘ไผผ่ฟ‘ๅ‚ๆŽข็ดขใฎๆ‰‹ๆณ•ใ‚’็น‹ใ’ใ‚‹ใƒ‘ใ‚คใƒ—ใƒฉใ‚คใƒณใ‚’stringใง่กจ่จ˜ใ™ใ‚‹faiss็‹ฌ่‡ชใฎ่จ˜ๆณ•ใงใ™ใ€‚
19
+ ใ“ใ‚Œใซใ‚ˆใ‚Šใ€index factoryใฎๆ–‡ๅญ—ๅˆ—ใ‚’ๅค‰ๆ›ดใ™ใ‚‹ใ ใ‘ใงๆง˜ใ€…ใช่ฟ‘ไผผ่ฟ‘ๅ‚ๆŽข็ดขใฎๆ‰‹ๆณ•ใ‚’่ฉฆใ›ใพใ™ใ€‚
20
+ RVCใงใฏไปฅไธ‹ใฎใ‚ˆใ†ใซไฝฟใ‚ใ‚Œใฆใ„ใพใ™ใ€‚
21
+
22
+ ```python
23
+ index = faiss.index_factory(256, "IVF%s,Flat" % n_ivf)
24
+ ```
25
+ index_factoryใฎๅผ•ๆ•ฐใฎใ†ใกใ€1ใค็›ฎใฏใƒ™ใ‚ฏใƒˆใƒซใฎๆฌกๅ…ƒๆ•ฐใ€2ใค็›ฎใฏindex factoryใฎๆ–‡ๅญ—ๅˆ—ใงใ€3ใค็›ฎใซใฏ็”จใ„ใ‚‹่ท้›ขใ‚’ๆŒ‡ๅฎšใ™ใ‚‹ใ“ใจใŒใงใใพใ™ใ€‚
26
+
27
+ ใ‚ˆใ‚Š่ฉณ็ดฐใช่จ˜ๆณ•ใซใคใ„ใฆใฏ
28
+ https://github.com/facebookresearch/faiss/wiki/The-index-factory
29
+
30
+ ## ่ท้›ขๆŒ‡ๆจ™
31
+ embeddingใฎ้กžไผผๅบฆใจใ—ใฆ็”จใ„ใ‚‰ใ‚Œใ‚‹ไปฃ่กจ็š„ใชๆŒ‡ๆจ™ใจใ—ใฆไปฅไธ‹ใฎไบŒใคใŒใ‚ใ‚Šใพใ™ใ€‚
32
+
33
+ - ใƒฆใƒผใ‚ฏใƒชใƒƒใƒ‰่ท้›ข(METRIC_L2)
34
+ - ๅ†…็ฉ(METRIC_INNER_PRODUCT)
35
+
36
+ ใƒฆใƒผใ‚ฏใƒชใƒƒใƒ‰่ท้›ขใงใฏๅ„ๆฌกๅ…ƒใซใŠใ„ใฆไบŒไน—ใฎๅทฎใ‚’ใจใ‚Šใ€ๅ…จๆฌกๅ…ƒใฎๅทฎใ‚’่ถณใ—ใฆใ‹ใ‚‰ๅนณๆ–นๆ นใ‚’ใจใ‚Šใพใ™ใ€‚ใ“ใ‚Œใฏๆ—ฅๅธธ็š„ใซ็”จใ„ใ‚‹2ๆฌกๅ…ƒใ€3ๆฌกๅ…ƒใงใฎ่ท้›ขใจๅŒใ˜ใงใ™ใ€‚
37
+ ๅ†…็ฉใฏใ“ใฎใพใพใงใฏ้กžไผผๅบฆใฎๆŒ‡ๆจ™ใจใ—ใฆ็”จใ„ใšใ€ไธ€่ˆฌ็š„ใซใฏL2ใƒŽใƒซใƒ ใงๆญฃ่ฆๅŒ–ใ—ใฆใ‹ใ‚‰ๅ†…็ฉใ‚’ใจใ‚‹ใ‚ณใ‚ตใ‚คใƒณ้กžไผผๅบฆใ‚’็”จใ„ใพใ™ใ€‚
38
+
39
+ ใฉใกใ‚‰ใŒใ‚ˆใ„ใ‹ใฏๅ ดๅˆใซใ‚ˆใ‚Šใพใ™ใŒใ€word2vec็ญ‰ใงๅพ—ใ‚‰ใ‚Œใ‚‹embeddingใ‚„ArcFace็ญ‰ใงๅญฆ็ฟ’ใ—ใŸ้กžไผผ็”ปๅƒๆคœ็ดขใฎใƒขใƒ‡ใƒซใงใฏใ‚ณใ‚ตใ‚คใƒณ้กžไผผๅบฆใŒ็”จใ„ใ‚‰ใ‚Œใ‚‹ใ“ใจใŒๅคšใ„ใงใ™ใ€‚ใƒ™ใ‚ฏใƒˆใƒซXใซๅฏพใ—ใฆl2ๆญฃ่ฆๅŒ–ใ‚’numpyใง่กŒใ†ๅ ดๅˆใฏใ€0 divisionใ‚’้ฟใ‘ใ‚‹ใŸใ‚ใซๅๅˆ†ใซๅฐใ•ใชๅ€คใ‚’epsใจใ—ใฆไปฅไธ‹ใฎใ‚ณใƒผใƒ‰ใงๅฏ่ƒฝใงใ™ใ€‚
40
+
41
+ ```python
42
+ X_normed = X / np.maximum(eps, np.linalg.norm(X, ord=2, axis=-1, keepdims=True))
43
+ ```
44
+
45
+ ใพใŸใ€index factoryใซใฏ็ฌฌ3ๅผ•ๆ•ฐใซๆธกใ™ๅ€คใ‚’้ธใถใ“ใจใง่จˆ็ฎ—ใซ็”จใ„ใ‚‹่ท้›ขๆŒ‡ๆจ™ใ‚’ๅค‰ๆ›ดใงใใพใ™ใ€‚
46
+
47
+ ```python
48
+ index = faiss.index_factory(dimention, text, faiss.METRIC_INNER_PRODUCT)
49
+ ```
50
+
51
+ ## IVF
52
+ IVF(Inverted file indexes)ใฏๅ…จๆ–‡ๆคœ็ดขใซใŠใ‘ใ‚‹่ปข็ฝฎใ‚คใƒณใƒ‡ใƒƒใ‚ฏใ‚นใจไผผใŸใ‚ˆใ†ใชใ‚ขใƒซใ‚ดใƒชใ‚บใƒ ใงใ™ใ€‚
53
+ ๅญฆ็ฟ’ๆ™‚ใซใฏๆคœ็ดขๅฏพ่ฑกใซๅฏพใ—ใฆkmeansใงใ‚ฏใƒฉใ‚นใ‚ฟใƒชใƒณใ‚ฐใ‚’่กŒใ„ใ€ใ‚ฏใƒฉใ‚นใ‚ฟไธญๅฟƒใ‚’็”จใ„ใฆใƒœใƒญใƒŽใ‚คๅˆ†ๅ‰ฒใ‚’่กŒใ„ใพใ™ใ€‚ๅ„ใƒ‡ใƒผใ‚ฟ็‚นใซใฏไธ€ใคใšใคใ‚ฏใƒฉใ‚นใ‚ฟใŒๅ‰ฒใ‚Šๅฝ“ใฆใ‚‰ใ‚Œใ‚‹ใฎใงใ€ใ‚ฏใƒฉใ‚นใ‚ฟใ‹ใ‚‰ใƒ‡ใƒผใ‚ฟ็‚นใ‚’้€†ๅผ•ใใ™ใ‚‹่พžๆ›ธใ‚’ไฝœๆˆใ—ใพใ™ใ€‚
54
+
55
+ ไพ‹ใˆใฐไปฅไธ‹ใฎใ‚ˆใ†ใซใ‚ฏใƒฉใ‚นใ‚ฟใŒๅ‰ฒใ‚Šๅฝ“ใฆใ‚‰ใ‚ŒใŸๅ ดๅˆ
56
+ |index|ใ‚ฏใƒฉใ‚นใ‚ฟ|
57
+ |-----|-------|
58
+ |1|A|
59
+ |2|B|
60
+ |3|A|
61
+ |4|C|
62
+ |5|B|
63
+
64
+ ไฝœๆˆใ•ใ‚Œใ‚‹่ปข็ฝฎใ‚คใƒณใƒ‡ใƒƒใ‚ฏใ‚นใฏไปฅไธ‹ใฎใ‚ˆใ†ใซใชใ‚Šใพใ™ใ€‚
65
+
66
+ |ใ‚ฏใƒฉใ‚นใ‚ฟ|index|
67
+ |-------|-----|
68
+ |A|1, 3|
69
+ |B|2, 5|
70
+ |C|4|
71
+
72
+ ๆคœ็ดขๆ™‚ใซใฏใพใšใ‚ฏใƒฉใ‚นใ‚ฟใ‹ใ‚‰n_probeๅ€‹ใฎใ‚ฏใƒฉใ‚นใ‚ฟใ‚’ๆคœ็ดขใ—ใ€ๆฌกใซใใ‚Œใžใ‚Œใฎใ‚ฏใƒฉใ‚นใ‚ฟใซๅฑžใ™ใ‚‹ใƒ‡ใƒผใ‚ฟ็‚นใซใคใ„ใฆ่ท้›ขใ‚’่จˆ็ฎ—ใ—ใพใ™ใ€‚
73
+
74
+ # ๆŽจๅฅจใ•ใ‚Œใ‚‹ใƒ‘ใƒฉใƒกใƒผใ‚ฟ
75
+ indexใฎ้ธใณๆ–นใซใคใ„ใฆใฏๅ…ฌๅผใซใ‚ฌใ‚คใƒ‰ใƒฉใ‚คใƒณใŒใ‚ใ‚‹ใฎใงใ€ใใ‚Œใซๆบ–ใ˜ใฆ่ชฌๆ˜Žใ—ใพใ™ใ€‚
76
+ https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index
77
+
78
+ 1Mไปฅไธ‹ใฎใƒ‡ใƒผใ‚ฟใ‚ปใƒƒใƒˆใซใŠใ„ใฆใฏ4bit-PQใŒ2023ๅนด4ๆœˆๆ™‚็‚นใงใฏfaissใงๅˆฉ็”จใงใใ‚‹ๆœ€ใ‚‚ๅŠน็އ็š„ใชๆ‰‹ๆณ•ใงใ™ใ€‚
79
+ ใ“ใ‚Œใ‚’IVFใจ็ต„ใฟๅˆใ‚ใ›ใ€4bit-PQใงๅ€™่ฃœใ‚’็ตžใ‚Šใ€ๆœ€ๅพŒใซๆญฃ็ขบใชๆŒ‡ๆจ™ใง่ท้›ขใ‚’ๅ†่จˆ็ฎ—ใ™ใ‚‹ใซใฏไปฅไธ‹ใฎindex factoryใ‚’็”จใ„ใ‚‹ใ“ใจใง่จ˜่ผ‰ใงใใพใ™ใ€‚
80
+
81
+ ```python
82
+ index = faiss.index_factory(256, "IVF1024,PQ128x4fs,RFlat")
83
+ ```
84
+
85
+ ## IVFใฎๆŽจๅฅจใƒ‘ใƒฉใƒกใƒผใ‚ฟ
86
+ IVFใฎๆ•ฐใŒๅคšใ™ใŽใ‚‹ๅ ดๅˆใ€ใŸใจใˆใฐใƒ‡ใƒผใ‚ฟๆ•ฐใฎๆ•ฐใ ใ‘IVFใซใ‚ˆใ‚‹็ฒ—้‡ๅญๅŒ–ใ‚’่กŒใ†ใจใ€ใ“ใ‚Œใฏๆ„š็›ดใชๅ…จๆŽข็ดขใจๅŒใ˜ใซใชใ‚ŠๅŠน็އใŒๆ‚ชใ„ใงใ™ใ€‚
87
+ 1Mไปฅไธ‹ใฎๅ ดๅˆใงใฏIVFใฎๅ€คใฏใƒ‡ใƒผใ‚ฟ็‚นใฎๆ•ฐNใซๅฏพใ—ใฆ4*sqrt(N) ~ 16*sqrt(N)ใซๆŽจๅฅจใ—ใฆใ„ใพใ™ใ€‚
88
+
89
+ n_probeใฏn_probeใฎๆ•ฐใซๆฏ”ไพ‹ใ—ใฆ่จˆ็ฎ—ๆ™‚้–“ใŒๅข—ใˆใ‚‹ใฎใงใ€็ฒพๅบฆใจ็›ธ่ซ‡ใ—ใฆ้ฉๅˆ‡ใซ้ธใ‚“ใงใใ ใ•ใ„ใ€‚ๅ€‹ไบบ็š„ใซใฏRVCใซใŠใ„ใฆใใ“ใพใง็ฒพๅบฆใฏๅฟ…่ฆใชใ„ใจๆ€ใ†ใฎใงn_probe = 1ใง่‰ฏใ„ใจๆ€ใ„ใพใ™ใ€‚
90
+
91
+ ## FastScan
92
+ FastScanใฏ็›ด็ฉ้‡ๅญๅŒ–ใงๅคงใพใ‹ใซ่ท้›ขใ‚’่ฟ‘ไผผใ™ใ‚‹ใฎใ‚’ใ€ใƒฌใ‚ธใ‚นใ‚ฟๅ†…ใง่กŒใ†ใ“ใจใซใ‚ˆใ‚Š้ซ˜้€Ÿใซ่กŒใ†ใ‚ˆใ†ใซใ—ใŸๆ‰‹ๆณ•ใงใ™ใ€‚
93
+ ็›ด็ฉ้‡ๅญๅŒ–ใฏๅญฆ็ฟ’ๆ™‚ใซdๆฌกๅ…ƒใ”ใจ(้€šๅธธใฏd=2)ใซ็‹ฌ็ซ‹ใ—ใฆใ‚ฏใƒฉใ‚นใ‚ฟใƒชใƒณใ‚ฐใ‚’่กŒใ„ใ€ใ‚ฏใƒฉใ‚นใ‚ฟๅŒๅฃซใฎ่ท้›ขใ‚’ไบ‹ๅ‰่จˆ็ฎ—ใ—ใฆlookup tableใ‚’ไฝœๆˆใ—ใพใ™ใ€‚ไบˆๆธฌๆ™‚ใฏlookup tableใ‚’่ฆ‹ใ‚‹ใ“ใจใงๅ„ๆฌกๅ…ƒใฎ่ท้›ขใ‚’O(1)ใง่จˆ็ฎ—ใงใใพใ™ใ€‚
94
+ ใใฎใŸใ‚ใ€PQใฎๆฌกใซๆŒ‡ๅฎšใ™ใ‚‹ๆ•ฐๅญ—ใฏ้€šๅธธใƒ™ใ‚ฏใƒˆใƒซใฎๅŠๅˆ†ใฎๆฌกๅ…ƒใ‚’ๆŒ‡ๅฎšใ—ใพใ™ใ€‚
95
+
96
+ FastScanใซ้–ขใ™ใ‚‹ใ‚ˆใ‚Š่ฉณ็ดฐใช่ชฌๆ˜Žใฏๅ…ฌๅผใฎใƒ‰ใ‚ญใƒฅใƒกใƒณใƒˆใ‚’ๅ‚็…งใ—ใฆใใ ใ•ใ„ใ€‚
97
+ https://github.com/facebookresearch/faiss/wiki/Fast-accumulation-of-PQ-and-AQ-codes-(FastScan)
98
+
99
+ ## RFlat
100
+ RFlatใฏFastScanใง่จˆ็ฎ—ใ—ใŸๅคงใพใ‹ใช่ท้›ขใ‚’ใ€index factoryใฎ็ฌฌไธ‰ๅผ•ๆ•ฐใงๆŒ‡ๅฎšใ—ใŸๆญฃ็ขบใช่ท้›ขใงๅ†่จˆ็ฎ—ใ™ใ‚‹ๆŒ‡็คบใงใ™ใ€‚
101
+ kๅ€‹ใฎ่ฟ‘ๅ‚ใ‚’ๅ–ๅพ—ใ™ใ‚‹้š›ใฏใ€k*k_factorๅ€‹ใฎ็‚นใซใคใ„ใฆๅ†่จˆ็ฎ—ใŒ่กŒใ‚ใ‚Œใพใ™ใ€‚
docs/faiss_tips_ko.md ADDED
@@ -0,0 +1,132 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Facebook AI Similarity Search (Faiss) ํŒ
2
+ ==================
3
+ # Faiss์— ๋Œ€ํ•˜์—ฌ
4
+ Faiss ๋Š” Facebook Research๊ฐ€ ๊ฐœ๋ฐœํ•˜๋Š”, ๊ณ ๋ฐ€๋„ ๋ฒกํ„ฐ ์ด์›ƒ ๊ฒ€์ƒ‰ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ž…๋‹ˆ๋‹ค. ๊ทผ์‚ฌ ๊ทผ์ ‘ ํƒ์ƒ‰๋ฒ• (Approximate Neigbor Search)์€ ์•ฝ๊ฐ„์˜ ์ •ํ™•์„ฑ์„ ํฌ์ƒํ•˜์—ฌ ์œ ์‚ฌ ๋ฒกํ„ฐ๋ฅผ ๊ณ ์†์œผ๋กœ ์ฐพ์Šต๋‹ˆ๋‹ค.
5
+
6
+ ## RVC์— ์žˆ์–ด์„œ Faiss
7
+ RVC์—์„œ๋Š” HuBERT๋กœ ๋ณ€ํ™˜ํ•œ feature์˜ embedding์„ ์œ„ํ•ด ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์—์„œ ์ƒ์„ฑ๋œ embedding๊ณผ ์œ ์‚ฌํ•œ embadding์„ ๊ฒ€์ƒ‰ํ•˜๊ณ  ํ˜ผํ•ฉํ•˜์—ฌ ์›๋ž˜์˜ ์Œ์„ฑ์— ๋”์šฑ ๊ฐ€๊นŒ์šด ๋ณ€ํ™˜์„ ๋‹ฌ์„ฑํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, ์ด ํƒ์ƒ‰๋ฒ•์€ ๋‹จ์ˆœํžˆ ์ˆ˜ํ–‰ํ•˜๋ฉด ์‹œ๊ฐ„์ด ๋‹ค์†Œ ์†Œ๋ชจ๋˜๋ฏ€๋กœ, ๊ทผ์‚ฌ ๊ทผ์ ‘ ํƒ์ƒ‰๋ฒ•์„ ํ†ตํ•ด ๊ณ ์† ๋ณ€ํ™˜์„ ๊ฐ€๋Šฅ์ผ€ ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
8
+
9
+ # ๊ตฌํ˜„ ๊ฐœ์š”
10
+ ๋ชจ๋ธ์ด ์œ„์น˜ํ•œ `/logs/your-experiment/3_feature256`์—๋Š” ๊ฐ ์Œ์„ฑ ๋ฐ์ดํ„ฐ์—์„œ HuBERT๊ฐ€ ์ถ”์ถœํ•œ feature๋“ค์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์—์„œ ํŒŒ์ผ ์ด๋ฆ„๋ณ„๋กœ ์ •๋ ฌ๋œ npy ํŒŒ์ผ์„ ์ฝ๊ณ , ๋ฒกํ„ฐ๋ฅผ ์—ฐ๊ฒฐํ•˜์—ฌ big_npy ([N, 256] ๋ชจ์–‘์˜ ๋ฒกํ„ฐ) ๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค. big_npy๋ฅผ `/logs/your-experiment/total_fea.npy`๋กœ ์ €์žฅํ•œ ํ›„, Faiss๋กœ ํ•™์Šต์‹œํ‚ต๋‹ˆ๋‹ค.
11
+
12
+ 2023/04/18 ๊ธฐ์ค€์œผ๋กœ, Faiss์˜ Index Factory ๊ธฐ๋Šฅ์„ ์ด์šฉํ•ด, L2 ๊ฑฐ๋ฆฌ์— ๊ทผ๊ฑฐํ•˜๋Š” IVF๋ฅผ ์ด์šฉํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. IVF์˜ ๋ถ„ํ• ์ˆ˜(n_ivf)๋Š” N//39๋กœ, n_probe๋Š” int(np.power(n_ivf, 0.3))๊ฐ€ ์‚ฌ์šฉ๋˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. (infer-web.py์˜ train_index ์ฃผ์œ„๋ฅผ ์ฐพ์œผ์‹ญ์‹œ์˜ค.)
13
+
14
+ ์ด ํŒ์—์„œ๋Š” ๋จผ์ € ์ด๋Ÿฌํ•œ ๋งค๊ฐœ ๋ณ€์ˆ˜์˜ ์˜๋ฏธ๋ฅผ ์„ค๋ช…ํ•˜๊ณ , ๊ฐœ๋ฐœ์ž๊ฐ€ ์ถ”ํ›„ ๋” ๋‚˜์€ index๋ฅผ ์ž‘์„ฑํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ์กฐ์–ธ์„ ์ž‘์„ฑํ•ฉ๋‹ˆ๋‹ค.
15
+
16
+ # ๋ฐฉ๋ฒ•์˜ ์„ค๋ช…
17
+ ## Index factory
18
+ index factory๋Š” ์—ฌ๋Ÿฌ ๊ทผ์‚ฌ ๊ทผ์ ‘ ํƒ์ƒ‰๋ฒ•์„ ๋ฌธ์ž์—ด๋กœ ์—ฐ๊ฒฐํ•˜๋Š” pipeline์„ ๋ฌธ์ž์—ด๋กœ ํ‘œ๊ธฐํ•˜๋Š” Faiss๋งŒ์˜ ๋…์ž์ ์ธ ๊ธฐ๋ฒ•์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด index factory์˜ ๋ฌธ์ž์—ด์„ ๋ณ€๊ฒฝํ•˜๋Š” ๊ฒƒ๋งŒ์œผ๋กœ ๋‹ค์–‘ํ•œ ๊ทผ์‚ฌ ๊ทผ์ ‘ ํƒ์ƒ‰์„ ์‹œ๋„ํ•ด ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. RVC์—์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค:
19
+
20
+ ```python
21
+ index = Faiss.index_factory(256, "IVF%s,Flat" % n_ivf)
22
+ ```
23
+ `index_factory`์˜ ์ธ์ˆ˜๋“ค ์ค‘ ์ฒซ ๋ฒˆ์งธ๋Š” ๋ฒกํ„ฐ์˜ ์ฐจ์› ์ˆ˜์ด๊ณ , ๋‘๋ฒˆ์งธ๋Š” index factory ๋ฌธ์ž์—ด์ด๋ฉฐ, ์„ธ๋ฒˆ์งธ์—๋Š” ์‚ฌ์šฉํ•  ๊ฑฐ๋ฆฌ๋ฅผ ์ง€์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
24
+
25
+ ๊ธฐ๋ฒ•์˜ ๋ณด๋‹ค ์ž์„ธํ•œ ์„ค๋ช…์€ https://github.com/facebookresearch/Faiss/wiki/The-index-factory ๋ฅผ ํ™•์ธํ•ด ์ฃผ์‹ญ์‹œ์˜ค.
26
+
27
+ ## ๊ฑฐ๋ฆฌ์— ๋Œ€ํ•œ index
28
+ embedding์˜ ์œ ์‚ฌ๋„๋กœ์„œ ์‚ฌ์šฉ๋˜๋Š” ๋Œ€ํ‘œ์ ์ธ ์ง€ํ‘œ๋กœ์„œ ์ดํ•˜์˜ 2๊ฐœ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.
29
+
30
+ - ์œ ํด๋ฆฌ๋“œ ๊ฑฐ๋ฆฌ (METRIC_L2)
31
+ - ๋‚ด์ (ๅ†…็ฉ) (METRIC_INNER_PRODUCT)
32
+
33
+ ์œ ํด๋ฆฌ๋“œ ๊ฑฐ๋ฆฌ์—์„œ๋Š” ๊ฐ ์ฐจ์›์—์„œ ์ œ๊ณฑ์˜ ์ฐจ๋ฅผ ๊ตฌํ•˜๊ณ , ๊ฐ ์ฐจ์›์—์„œ ๊ตฌํ•œ ์ฐจ๋ฅผ ๋ชจ๋‘ ๋”ํ•œ ํ›„ ์ œ๊ณฑ๊ทผ์„ ์ทจํ•ฉ๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ ์ผ์ƒ์ ์œผ๋กœ ์‚ฌ์šฉ๋˜๋Š” 2์ฐจ์›, 3์ฐจ์›์—์„œ์˜ ๊ฑฐ๋ฆฌ์˜ ์—ฐ์‚ฐ๋ฒ•๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. ๋‚ด์ ์€ ๊ทธ ๊ฐ’์„ ๊ทธ๋Œ€๋กœ ์œ ์‚ฌ๋„ ์ง€ํ‘œ๋กœ ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ณ , L2 ์ •๊ทœํ™”๋ฅผ ํ•œ ์ดํ›„ ๋‚ด์ ์„ ์ทจํ•˜๋Š” ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
34
+
35
+ ์–ด๋А ์ชฝ์ด ๋” ์ข‹์€์ง€๋Š” ๊ฒฝ์šฐ์— ๋”ฐ๋ผ ๋‹ค๋ฅด์ง€๋งŒ, word2vec์—์„œ ์–ป์€ embedding ๋ฐ ArcFace๋ฅผ ํ™œ์šฉํ•œ ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰ ๋ชจ๋ธ์€ ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ์„ฑ์ด ์ด์šฉ๋˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์Šต๋‹ˆ๋‹ค. numpy๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฒกํ„ฐ X์— ๋Œ€ํ•ด L2 ์ •๊ทœํ™”๋ฅผ ํ•˜๊ณ ์ž ํ•˜๋Š” ๊ฒฝ์šฐ, 0 division์„ ํ”ผํ•˜๊ธฐ ์œ„ํ•ด ์ถฉ๋ถ„ํžˆ ์ž‘์€ ๊ฐ’์„ eps๋กœ ํ•œ ๋’ค ์ดํ•˜์— ์ฝ”๋“œ๋ฅผ ํ™œ์šฉํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.
36
+
37
+ ```python
38
+ X_normed = X / np.maximum(eps, np.linalg.norm(X, ord=2, axis=-1, keepdims=True))
39
+ ```
40
+
41
+ ๋˜ํ•œ, `index factory`์˜ 3๋ฒˆ์งธ ์ธ์ˆ˜์— ๊ฑด๋„ค์ฃผ๋Š” ๊ฐ’์„ ์„ ํƒํ•˜๋Š” ๊ฒƒ์„ ํ†ตํ•ด ๊ณ„์‚ฐ์— ์‚ฌ์šฉํ•˜๋Š” ๊ฑฐ๋ฆฌ index๋ฅผ ๋ณ€๊ฒฝํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
42
+
43
+ ```python
44
+ index = Faiss.index_factory(dimention, text, Faiss.METRIC_INNER_PRODUCT)
45
+ ```
46
+
47
+ ## IVF
48
+ IVF (Inverted file indexes)๋Š” ์—ญ์ƒ‰์ธ ํƒ์ƒ‰๋ฒ•๊ณผ ์œ ์‚ฌํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค. ํ•™์Šต์‹œ์—๋Š” ๊ฒ€์ƒ‰ ๋Œ€์ƒ์— ๋Œ€ํ•ด k-ํ‰๊ท  ๊ตฐ์ง‘๋ฒ•์„ ์‹ค์‹œํ•˜๊ณ  ํด๋Ÿฌ์Šคํ„ฐ ์ค‘์‹ฌ์„ ์ด์šฉํ•ด ๋ณด๋กœ๋…ธ์ด ๋ถ„ํ• ์„ ์‹ค์‹œํ•ฉ๋‹ˆ๋‹ค. ๊ฐ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์—๋Š” ํด๋Ÿฌ์Šคํ„ฐ๊ฐ€ ํ• ๋‹น๋˜๋ฏ€๋กœ, ํด๋Ÿฌ์Šคํ„ฐ์—์„œ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๋ฅผ ์กฐํšŒํ•˜๋Š” dictionary๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค.
49
+
50
+ ์˜ˆ๋ฅผ ๋“ค์–ด, ํด๋Ÿฌ์Šคํ„ฐ๊ฐ€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ• ๋‹น๋œ ๊ฒฝ์šฐ
51
+ |index|Cluster|
52
+ |-----|-------|
53
+ |1|A|
54
+ |2|B|
55
+ |3|A|
56
+ |4|C|
57
+ |5|B|
58
+
59
+ IVF ์ดํ›„์˜ ๊ฒฐ๊ณผ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:
60
+
61
+ |cluster|index|
62
+ |-------|-----|
63
+ |A|1, 3|
64
+ |B|2, 5|
65
+ |C|4|
66
+
67
+ ํƒ์ƒ‰ ์‹œ, ์šฐ์„  ํด๋Ÿฌ์Šคํ„ฐ์—์„œ `n_probe`๊ฐœ์˜ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ํƒ์ƒ‰ํ•œ ๋‹ค์Œ, ๊ฐ ํด๋Ÿฌ์Šคํ„ฐ์— ์†ํ•œ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์˜ ๊ฑฐ๋ฆฌ๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
68
+
69
+ # ๊ถŒ์žฅ ๋งค๊ฐœ๋ณ€์ˆ˜
70
+ index์˜ ์„ ํƒ ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด์„œ๋Š” ๊ณต์‹์ ์œผ๋กœ ๊ฐ€์ด๋“œ ๋ผ์ธ์ด ์žˆ์œผ๋ฏ€๋กœ, ๊ฑฐ๊ธฐ์— ์ค€ํ•ด ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.
71
+ https://github.com/facebookresearch/Faiss/wiki/Guidelines-to-choose-an-index
72
+
73
+ 1M ์ดํ•˜์˜ ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ์žˆ์–ด์„œ๋Š” 4bit-PQ๊ฐ€ 2023๋…„ 4์›” ์‹œ์ ์—์„œ๋Š” Faiss๋กœ ์ด์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๊ฐ€์žฅ ํšจ์œจ์ ์ธ ์ˆ˜๋ฒ•์ž…๋‹ˆ๋‹ค. ์ด๊ฒƒ์„ IVF์™€ ์กฐํ•ฉํ•ด, 4bit-PQ๋กœ ํ›„๋ณด๋ฅผ ์ถ”๋ ค๋‚ด๊ณ , ๋งˆ์ง€๋ง‰์œผ๋กœ ์ดํ•˜์˜ index factory๋ฅผ ์ด์šฉํ•˜์—ฌ ์ •ํ™•ํ•œ ์ง€ํ‘œ๋กœ ๊ฑฐ๋ฆฌ๏ฟฝ๏ฟฝ๏ฟฝ ์žฌ๊ณ„์‚ฐํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.
74
+
75
+ ```python
76
+ index = Faiss.index_factory(256, "IVF1024,PQ128x4fs,RFlat")
77
+ ```
78
+
79
+ ## IVF ๊ถŒ์žฅ ๋งค๊ฐœ๋ณ€์ˆ˜
80
+ IVF์˜ ์ˆ˜๊ฐ€ ๋„ˆ๋ฌด ๋งŽ์œผ๋ฉด, ๊ฐ€๋ น ๋ฐ์ดํ„ฐ ์ˆ˜์˜ ์ˆ˜๋งŒํผ IVF๋กœ ์–‘์žํ™”(Quantization)๋ฅผ ์ˆ˜ํ–‰ํ•˜๋ฉด, ์ด๊ฒƒ์€ ์™„์ „ํƒ์ƒ‰๊ณผ ๊ฐ™์•„์ ธ ํšจ์œจ์ด ๋‚˜๋น ์ง€๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. 1M ์ดํ•˜์˜ ๊ฒฝ์šฐ IVF ๊ฐ’์€ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ ์ˆ˜ N์— ๋Œ€ํ•ด 4sqrt(N) ~ 16sqrt(N)๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์„ ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค.
81
+
82
+ n_probe๋Š” n_probe์˜ ์ˆ˜์— ๋น„๋ก€ํ•˜์—ฌ ๊ณ„์‚ฐ ์‹œ๊ฐ„์ด ๋Š˜์–ด๋‚˜๋ฏ€๋กœ ์ •ํ™•๋„์™€ ์‹œ๊ฐ„์„ ์ ์ ˆํžˆ ๊ท ํ˜•์„ ๋งž์ถ”์–ด ์ฃผ์‹ญ์‹œ์˜ค. ๊ฐœ์ธ์ ์œผ๋กœ RVC์— ์žˆ์–ด์„œ ๊ทธ๋ ‡๊ฒŒ๊นŒ์ง€ ์ •ํ™•๋„๋Š” ํ•„์š” ์—†๋‹ค๊ณ  ์ƒ๊ฐํ•˜๊ธฐ ๋•Œ๋ฌธ์— n_probe = 1์ด๋ฉด ๋œ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค.
83
+
84
+ ## FastScan
85
+ FastScan์€ ์ง์  ์–‘์žํ™”๋ฅผ ๋ ˆ์ง€์Šคํ„ฐ์—์„œ ์ˆ˜ํ–‰ํ•จ์œผ๋กœ์จ ๊ฑฐ๋ฆฌ์˜ ๊ณ ์† ๊ทผ์‚ฌ๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.์ง์  ์–‘์žํ™”๋Š” ํ•™์Šต์‹œ์— d์ฐจ์›๋งˆ๋‹ค(๋ณดํ†ต d=2)์— ๋…๋ฆฝ์ ์œผ๋กœ ํด๋Ÿฌ์Šคํ„ฐ๋ง์„ ์‹ค์‹œํ•ด, ํด๋Ÿฌ์Šคํ„ฐ๋ผ๋ฆฌ์˜ ๊ฑฐ๋ฆฌ๋ฅผ ์‚ฌ์ „ ๊ณ„์‚ฐํ•ด lookup table๋ฅผ ์ž‘์„ฑํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ์ธก์‹œ๋Š” lookup table์„ ๋ณด๋ฉด ๊ฐ ์ฐจ์›์˜ ๊ฑฐ๋ฆฌ๋ฅผ O(1)๋กœ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ PQ ๋‹ค์Œ์— ์ง€์ •ํ•˜๋Š” ์ˆซ์ž๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ๋ฒกํ„ฐ์˜ ์ ˆ๋ฐ˜ ์ฐจ์›์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค.
86
+
87
+ FastScan์— ๋Œ€ํ•œ ์ž์„ธํ•œ ์„ค๋ช…์€ ๊ณต์‹ ๋ฌธ์„œ๋ฅผ ์ฐธ์กฐํ•˜์‹ญ์‹œ์˜ค.
88
+ https://github.com/facebookresearch/Faiss/wiki/Fast-accumulation-of-PQ-and-AQ-codes-(FastScan)
89
+
90
+ ## RFlat
91
+ RFlat์€ FastScan์ด ๊ณ„์‚ฐํ•œ ๋Œ€๋žต์ ์ธ ๊ฑฐ๋ฆฌ๋ฅผ index factory์˜ 3๋ฒˆ์งธ ์ธ์ˆ˜๋กœ ์ง€์ •ํ•œ ์ •ํ™•ํ•œ ๊ฑฐ๋ฆฌ๋กœ ๋‹ค์‹œ ๊ณ„์‚ฐํ•˜๋ผ๋Š” ์ธ์ŠคํŠธ๋Ÿญ์…˜์ž…๋‹ˆ๋‹ค. k๊ฐœ์˜ ๊ทผ์ ‘ ๋ณ€์ˆ˜๋ฅผ ๊ฐ€์ ธ์˜ฌ ๋•Œ k*k_factor๊ฐœ์˜ ์ ์— ๋Œ€ํ•ด ์žฌ๊ณ„์‚ฐ์ด ์ด๋ฃจ์–ด์ง‘๋‹ˆ๋‹ค.
92
+
93
+ # Embedding ํ…Œํฌ๋‹‰
94
+ ## Alpha ์ฟผ๋ฆฌ ํ™•์žฅ
95
+ ํ€ด๋ฆฌ ํ™•์žฅ์ด๋ž€ ํƒ์ƒ‰์—์„œ ์‚ฌ์šฉ๋˜๋Š” ๊ธฐ์ˆ ๋กœ, ์˜ˆ๋ฅผ ๋“ค์–ด ์ „๋ฌธ ํƒ์ƒ‰ ์‹œ, ์ž…๋ ฅ๋œ ๊ฒ€์ƒ‰๋ฌธ์— ๋‹จ์–ด๋ฅผ ๋ช‡ ๊ฐœ๋ฅผ ์ถ”๊ฐ€ํ•จ์œผ๋กœ์จ ๊ฒ€์ƒ‰ ์ •ํ™•๋„๋ฅผ ์˜ฌ๋ฆฌ๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. ๋ฐฑํ„ฐ ํƒ์ƒ‰์„ ์œ„ํ•ด์„œ๋„ ๋ช‡๊ฐ€์ง€ ๋ฐฉ๋ฒ•์ด ์ œ์•ˆ๋˜์—ˆ๋Š”๋ฐ, ๊ทธ ์ค‘ ฮฑ-์ฟผ๋ฆฌ ํ™•์žฅ์€ ์ถ”๊ฐ€ ํ•™์Šต์ด ํ•„์š” ์—†๋Š” ๋งค์šฐ ํšจ๊ณผ์ ์ธ ๋ฐฉ๋ฒ•์œผ๋กœ ์•Œ๋ ค์ ธ ์žˆ์Šต๋‹ˆ๋‹ค. [Attention-Based Query Expansion Learning](https://arxiv.org/abs/2007.08019)์™€ [2nd place solution of kaggle shopee competition](https://www.kaggle.com/code/lyakaap/2nd-place-solution/notebook) ๋…ผ๋ฌธ์—์„œ ์†Œ๊ฐœ๋œ ๋ฐ” ์žˆ์Šต๋‹ˆ๋‹ค..
96
+
97
+ ฮฑ-์ฟผ๋ฆฌ ํ™•์žฅ์€ ํ•œ ๋ฒกํ„ฐ์— ์ธ์ ‘ํ•œ ๋ฒกํ„ฐ๋ฅผ ์œ ์‚ฌ๋„์˜ ฮฑ๊ณฑํ•œ ๊ฐ€์ค‘์น˜๋กœ ๋”ํ•ด์ฃผ๋ฉด ๋ฉ๋‹ˆ๋‹ค. ์ฝ”๋“œ๋กœ ์˜ˆ์‹œ๋ฅผ ๋“ค์–ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. big_npy๋ฅผ ฮฑ query expansion๋กœ ๋Œ€์ฒดํ•ฉ๋‹ˆ๋‹ค.
98
+
99
+ ```python
100
+ alpha = 3.
101
+ index = Faiss.index_factory(256, "IVF512,PQ128x4fs,RFlat")
102
+ original_norm = np.maximum(np.linalg.norm(big_npy, ord=2, axis=1, keepdims=True), 1e-9)
103
+ big_npy /= original_norm
104
+ index.train(big_npy)
105
+ index.add(big_npy)
106
+ dist, neighbor = index.search(big_npy, num_expand)
107
+
108
+ expand_arrays = []
109
+ ixs = np.arange(big_npy.shape[0])
110
+ for i in range(-(-big_npy.shape[0]//batch_size)):
111
+ ix = ixs[i*batch_size:(i+1)*batch_size]
112
+ weight = np.power(np.einsum("nd,nmd->nm", big_npy[ix], big_npy[neighbor[ix]]), alpha)
113
+ expand_arrays.append(np.sum(big_npy[neighbor[ix]] * np.expand_dims(weight, axis=2),axis=1))
114
+ big_npy = np.concatenate(expand_arrays, axis=0)
115
+
116
+ # index version ์ •๊ทœํ™”
117
+ big_npy = big_npy / np.maximum(np.linalg.norm(big_npy, ord=2, axis=1, keepdims=True), 1e-9)
118
+ ```
119
+
120
+ ์œ„ ํ…Œํฌ๋‹‰์€ ํƒ์ƒ‰์„ ์ˆ˜ํ–‰ํ•˜๋Š” ์ฟผ๋ฆฌ์—๋„, ํƒ์ƒ‰ ๋Œ€์ƒ DB์—๋„ ์ ์‘ ๊ฐ€๋Šฅํ•œ ํ…Œํฌ๋‹‰์ž…๋‹ˆ๋‹ค.
121
+
122
+ ## MiniBatch KMeans์— ์˜ํ•œ embedding ์••์ถ•
123
+
124
+ total_fea.npy๊ฐ€ ๋„ˆ๋ฌด ํด ๊ฒฝ์šฐ K-means๋ฅผ ์ด์šฉํ•˜์—ฌ ๋ฒกํ„ฐ๋ฅผ ์ž‘๊ฒŒ ๋งŒ๋“œ๋Š” ๊ฒƒ์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. ์ดํ•˜ ์ฝ”๋“œ๋กœ embedding์˜ ์••์ถ•์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. n_clusters์— ์••์ถ•ํ•˜๊ณ ์ž ํ•˜๋Š” ํฌ๊ธฐ๋ฅผ ์ง€์ •ํ•˜๊ณ  batch_size์— 256 * CPU์˜ ์ฝ”์–ด ์ˆ˜๋ฅผ ์ง€์ •ํ•จ์œผ๋กœ์จ CPU ๋ณ‘๋ ฌํ™”์˜ ํ˜œํƒ์„ ์ถฉ๋ถ„ํžˆ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
125
+
126
+ ```python
127
+ import multiprocessing
128
+ from sklearn.cluster import MiniBatchKMeans
129
+ kmeans = MiniBatchKMeans(n_clusters=10000, batch_size=256 * multiprocessing.cpu_count(), init="random")
130
+ kmeans.fit(big_npy)
131
+ sample_npy = kmeans.cluster_centers_
132
+ ```
docs/faq.md ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Q1:ffmpeg error/utf8 error.
2
+
3
+ ๅคงๆฆ‚็އไธๆ˜ฏffmpeg้—ฎ้ข˜๏ผŒ่€Œๆ˜ฏ้Ÿณ้ข‘่ทฏๅพ„้—ฎ้ข˜๏ผ›<br>
4
+ ffmpeg่ฏปๅ–่ทฏๅพ„ๅธฆ็ฉบๆ ผใ€()็ญ‰็‰นๆฎŠ็ฌฆๅท๏ผŒๅฏ่ƒฝๅ‡บ็Žฐffmpeg error๏ผ›่ฎญ็ปƒ้›†้Ÿณ้ข‘ๅธฆไธญๆ–‡่ทฏๅพ„๏ผŒๅœจๅ†™ๅ…ฅfilelist.txt็š„ๆ—ถๅ€™ๅฏ่ƒฝๅ‡บ็Žฐutf8 error๏ผ›<br>
5
+
6
+ ## Q2:ไธ€้”ฎ่ฎญ็ปƒ็ป“ๆŸๆฒกๆœ‰็ดขๅผ•
7
+
8
+ ๆ˜พ็คบ"Training is done. The program is closed."ๅˆ™ๆจกๅž‹่ฎญ็ปƒๆˆๅŠŸ๏ผŒๅŽ็ปญ็ดง้‚ป็š„ๆŠฅ้”™ๆ˜ฏๅ‡็š„๏ผ›<br>
9
+
10
+ ไธ€้”ฎ่ฎญ็ปƒ็ป“ๆŸๅฎŒๆˆๆฒกๆœ‰addedๅผ€ๅคด็š„็ดขๅผ•ๆ–‡ไปถ๏ผŒๅฏ่ƒฝๆ˜ฏๅ› ไธบ่ฎญ็ปƒ้›†ๅคชๅคงๅกไฝไบ†ๆทปๅŠ ็ดขๅผ•็š„ๆญฅ้ชค๏ผ›ๅทฒ้€š่ฟ‡ๆ‰นๅค„็†add็ดขๅผ•่งฃๅ†ณๅ†…ๅญ˜add็ดขๅผ•ๅฏนๅ†…ๅญ˜้œ€ๆฑ‚่ฟ‡ๅคง็š„้—ฎ้ข˜ใ€‚ไธดๆ—ถๅฏๅฐ่ฏ•ๅ†ๆฌก็‚นๅ‡ป"่ฎญ็ปƒ็ดขๅผ•"ๆŒ‰้’ฎใ€‚<br>
11
+
12
+ ## Q3:่ฎญ็ปƒ็ป“ๆŸๆŽจ็†ๆฒก็œ‹ๅˆฐ่ฎญ็ปƒ้›†็š„้Ÿณ่‰ฒ
13
+ ็‚นๅˆทๆ–ฐ้Ÿณ่‰ฒๅ†็œ‹็œ‹๏ผŒๅฆ‚ๆžœ่ฟ˜ๆฒกๆœ‰็œ‹็œ‹่ฎญ็ปƒๆœ‰ๆฒกๆœ‰ๆŠฅ้”™๏ผŒๆŽงๅˆถๅฐๅ’Œwebui็š„ๆˆชๅ›พ๏ผŒlogs/ๅฎž้ชŒๅไธ‹็š„log๏ผŒ้ƒฝๅฏไปฅๅ‘็ป™ๅผ€ๅ‘่€…็œ‹็œ‹ใ€‚<br>
14
+
15
+ ## Q4:ๅฆ‚ไฝ•ๅˆ†ไบซๆจกๅž‹
16
+ โ€ƒโ€ƒrvc_root/logs/ๅฎž้ชŒๅ ไธ‹้ขๅญ˜ๅ‚จ็š„pthไธๆ˜ฏ็”จๆฅๅˆ†ไบซๆจกๅž‹็”จๆฅๆŽจ็†็š„๏ผŒ่€Œๆ˜ฏไธบไบ†ๅญ˜ๅ‚จๅฎž้ชŒ็Šถๆ€ไพ›ๅค็Žฐ๏ผŒไปฅๅŠ็ปง็ปญ่ฎญ็ปƒ็”จ็š„ใ€‚็”จๆฅๅˆ†ไบซ็š„ๆจกๅž‹ๅบ”่ฏฅๆ˜ฏweightsๆ–‡ไปถๅคนไธ‹ๅคงๅฐไธบ60+MB็š„pthๆ–‡ไปถ๏ผ›<br>
17
+ โ€ƒโ€ƒๅŽ็ปญๅฐ†ๆŠŠweights/exp_name.pthๅ’Œlogs/exp_name/added_xxx.indexๅˆๅนถๆ‰“ๅŒ…ๆˆweights/exp_name.zip็œๅŽปๅกซๅ†™index็š„ๆญฅ้ชค๏ผŒ้‚ฃไนˆzipๆ–‡ไปถ็”จๆฅๅˆ†ไบซ๏ผŒไธ่ฆๅˆ†ไบซpthๆ–‡ไปถ๏ผŒ้™ค้žๆ˜ฏๆƒณๆขๆœบๅ™จ็ปง็ปญ่ฎญ็ปƒ๏ผ›<br>
18
+ โ€ƒโ€ƒๅฆ‚ๆžœไฝ ๆŠŠlogsๆ–‡ไปถๅคนไธ‹็š„ๅ‡ ็™พMB็š„pthๆ–‡ไปถๅคๅˆถ/ๅˆ†ไบซๅˆฐweightsๆ–‡ไปถๅคนไธ‹ๅผบ่กŒ็”จไบŽๆŽจ็†๏ผŒๅฏ่ƒฝไผšๅ‡บ็Žฐf0๏ผŒtgt_sr็ญ‰ๅ„็งkeyไธๅญ˜ๅœจ็š„ๆŠฅ้”™ใ€‚ไฝ ้œ€่ฆ็”จckpt้€‰้กนๅกๆœ€ไธ‹้ข๏ผŒๆ‰‹ๅทฅๆˆ–่‡ชๅŠจ๏ผˆๆœฌๅœฐlogsไธ‹ๅฆ‚ๆžœ่ƒฝๆ‰พๅˆฐ็›ธๅ…ณไฟกๆฏๅˆ™ไผš่‡ชๅŠจ๏ผ‰้€‰ๆ‹ฉๆ˜ฏๅฆๆบๅธฆ้Ÿณ้ซ˜ใ€็›ฎๆ ‡้Ÿณ้ข‘้‡‡ๆ ท็އ็š„้€‰้กนๅŽ่ฟ›่กŒckptๅฐๆจกๅž‹ๆๅ–๏ผˆ่พ“ๅ…ฅ่ทฏๅพ„ๅกซGๅผ€ๅคด็š„้‚ฃไธช๏ผ‰๏ผŒๆๅ–ๅฎŒๅœจweightsๆ–‡ไปถๅคนไธ‹ไผšๅ‡บ็Žฐ60+MB็š„pthๆ–‡ไปถ๏ผŒๅˆทๆ–ฐ้Ÿณ่‰ฒๅŽๅฏไปฅ้€‰ๆ‹ฉไฝฟ็”จใ€‚<br>
19
+
20
+ ## Q5:Connection Error.
21
+ ไนŸ่ฎธไฝ ๅ…ณ้—ญไบ†ๆŽงๅˆถๅฐ๏ผˆ้ป‘่‰ฒ็ช—ๅฃ๏ผ‰ใ€‚<br>
22
+
23
+ ## Q6:WebUIๅผนๅ‡บExpecting value: line 1 column 1 (char 0).
24
+ ่ฏทๅ…ณ้—ญ็ณป็ปŸๅฑ€ๅŸŸ็ฝ‘ไปฃ็†/ๅ…จๅฑ€ไปฃ็†ใ€‚<br>
25
+
26
+ ่ฟ™ไธชไธไป…ๆ˜ฏๅฎขๆˆท็ซฏ็š„ไปฃ็†๏ผŒไนŸๅŒ…ๆ‹ฌๆœๅŠก็ซฏ็š„ไปฃ็†๏ผˆไพ‹ๅฆ‚ไฝ ไฝฟ็”จautodl่ฎพ็ฝฎไบ†http_proxyๅ’Œhttps_proxyๅญฆๆœฏๅŠ ้€Ÿ๏ผŒไฝฟ็”จๆ—ถไนŸ้œ€่ฆunsetๅ…ณๆމ๏ผ‰<br>
27
+
28
+ ## Q7:ไธ็”จWebUIๅฆ‚ไฝ•้€š่ฟ‡ๅ‘ฝไปค่ฎญ็ปƒๆŽจ็†
29
+ ่ฎญ็ปƒ่„šๆœฌ๏ผš<br>
30
+ ๅฏๅ…ˆ่ท‘้€šWebUI๏ผŒๆถˆๆฏ็ช—ๅ†…ไผšๆ˜พ็คบๆ•ฐๆฎ้›†ๅค„็†ๅ’Œ่ฎญ็ปƒ็”จๅ‘ฝไปค่กŒ๏ผ›<br>
31
+
32
+ ๆŽจ็†่„šๆœฌ๏ผš<br>
33
+ https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/myinfer.py<br>
34
+
35
+ ไพ‹ๅญ๏ผš<br>
36
+
37
+ runtime\python.exe myinfer.py 0 "E:\codes\py39\RVC-beta\todo-songs\1111.wav" "E:\codes\py39\logs\mi-test\added_IVF677_Flat_nprobe_7.index" harvest "test.wav" "weights/mi-test.pth" 0.6 cuda:0 True<br>
38
+
39
+ f0up_key=sys.argv[1]<br>
40
+ input_path=sys.argv[2]<br>
41
+ index_path=sys.argv[3]<br>
42
+ f0method=sys.argv[4]#harvest or pm<br>
43
+ opt_path=sys.argv[5]<br>
44
+ model_path=sys.argv[6]<br>
45
+ index_rate=float(sys.argv[7])<br>
46
+ device=sys.argv[8]<br>
47
+ is_half=bool(sys.argv[9])<br>
48
+
49
+ ## Q8:Cuda error/Cuda out of memory.
50
+ ๅฐๆฆ‚็އๆ˜ฏcuda้…็ฝฎ้—ฎ้ข˜ใ€่ฎพๅค‡ไธๆ”ฏๆŒ๏ผ›ๅคงๆฆ‚็އๆ˜ฏๆ˜พๅญ˜ไธๅคŸ๏ผˆout of memory๏ผ‰๏ผ›<br>
51
+
52
+ ่ฎญ็ปƒ็š„่ฏ็ผฉๅฐbatch size๏ผˆๅฆ‚ๆžœ็ผฉๅฐๅˆฐ1่ฟ˜ไธๅคŸๅช่ƒฝๆ›ดๆขๆ˜พๅก่ฎญ็ปƒ๏ผ‰๏ผŒๆŽจ็†็š„่ฏ้…Œๆƒ…็ผฉๅฐconfig.py็ป“ๅฐพ็š„x_pad๏ผŒx_query๏ผŒx_center๏ผŒx_maxใ€‚4Gไปฅไธ‹ๆ˜พๅญ˜๏ผˆไพ‹ๅฆ‚1060๏ผˆ3G๏ผ‰ๅ’Œๅ„็ง2Gๆ˜พๅก๏ผ‰ๅฏไปฅ็›ดๆŽฅๆ”พๅผƒ๏ผŒ4Gๆ˜พๅญ˜ๆ˜พๅก่ฟ˜ๆœ‰ๆ•‘ใ€‚<br>
53
+
54
+ ## Q9:total_epoch่ฐƒๅคšๅฐ‘ๆฏ”่พƒๅฅฝ
55
+
56
+ ๅฆ‚ๆžœ่ฎญ็ปƒ้›†้Ÿณ่ดจๅทฎๅบ•ๅ™ชๅคง๏ผŒ20~30่ถณๅคŸไบ†๏ผŒ่ฐƒๅคช้ซ˜๏ผŒๅบ•ๆจก้Ÿณ่ดจๆ— ๆณ•ๅธฆ้ซ˜ไฝ ็š„ไฝŽ้Ÿณ่ดจ่ฎญ็ปƒ้›†<br>
57
+ ๅฆ‚ๆžœ่ฎญ็ปƒ้›†้Ÿณ่ดจ้ซ˜ๅบ•ๅ™ชไฝŽๆ—ถ้•ฟๅคš๏ผŒๅฏไปฅ่ฐƒ้ซ˜๏ผŒ200ๆ˜ฏok็š„๏ผˆ่ฎญ็ปƒ้€Ÿๅบฆๅพˆๅฟซ๏ผŒๆ—ข็„ถไฝ ๆœ‰ๆกไปถๅ‡†ๅค‡้ซ˜้Ÿณ่ดจ่ฎญ็ปƒ้›†๏ผŒๆ˜พๅกๆƒณๅฟ…ๆกไปถไนŸไธ้”™๏ผŒ่‚ฏๅฎšไธๅœจไนŽๅคšไธ€ไบ›่ฎญ็ปƒๆ—ถ้—ด๏ผ‰<br>
58
+
59
+ ## Q10:้œ€่ฆๅคšๅฐ‘่ฎญ็ปƒ้›†ๆ—ถ้•ฟ
60
+ โ€ƒโ€ƒๆŽจ่10min่‡ณ50min<br>
61
+ โ€ƒโ€ƒไฟ่ฏ้Ÿณ่ดจ้ซ˜ๅบ•ๅ™ชไฝŽ็š„ๆƒ…ๅ†ตไธ‹๏ผŒๅฆ‚ๆžœๆœ‰ไธชไบบ็‰น่‰ฒ็š„้Ÿณ่‰ฒ็ปŸไธ€๏ผŒๅˆ™ๅคšๅคš็›Šๅ–„<br>
62
+ โ€ƒโ€ƒ้ซ˜ๆฐดๅนณ็š„่ฎญ็ปƒ้›†๏ผˆ็ฒพ็ฎ€+้Ÿณ่‰ฒๆœ‰็‰น่‰ฒ๏ผ‰๏ผŒ5min่‡ณ10minไนŸๆ˜ฏok็š„๏ผŒไป“ๅบ“ไฝœ่€…ๆœฌไบบๅฐฑ็ปๅธธ่ฟ™ไนˆ็Žฉ<br>
63
+ โ€ƒโ€ƒไนŸๆœ‰ไบบๆ‹ฟ1min่‡ณ2min็š„ๆ•ฐๆฎๆฅ่ฎญ็ปƒๅนถไธ”่ฎญ็ปƒๆˆๅŠŸ็š„๏ผŒไฝ†ๆ˜ฏๆˆๅŠŸ็ป้ชŒๆ˜ฏๅ…ถไป–ไบบไธๅฏๅค็Žฐ็š„๏ผŒไธๅคชๅ…ทๅค‡ๅ‚่€ƒไปทๅ€ผใ€‚่ฟ™่ฆๆฑ‚่ฎญ็ปƒ้›†้Ÿณ่‰ฒ็‰น่‰ฒ้žๅธธๆ˜Žๆ˜พ๏ผˆๆฏ”ๅฆ‚่ฏด้ซ˜้ข‘ๆฐ”ๅฃฐ่พƒๆ˜Žๆ˜พ็š„่่މๅฐ‘ๅฅณ้Ÿณ๏ผ‰๏ผŒไธ”้Ÿณ่ดจ้ซ˜๏ผ›<br>
64
+ โ€ƒโ€ƒ1minไปฅไธ‹ๆ—ถ้•ฟๆ•ฐๆฎ็›ฎๅ‰ๆฒก่งๆœ‰ไบบๅฐ่ฏ•๏ผˆๆˆๅŠŸ๏ผ‰่ฟ‡ใ€‚ไธๅปบ่ฎฎ่ฟ›่กŒ่ฟ™็ง้ฌผ็•œ่กŒไธบใ€‚<br>
65
+
66
+ ## Q11:index rateๅนฒๅ˜›็”จ็š„๏ผŒๆ€Žไนˆ่ฐƒ๏ผˆ็ง‘ๆ™ฎ๏ผ‰
67
+ โ€ƒโ€ƒๅฆ‚ๆžœๅบ•ๆจกๅ’ŒๆŽจ็†ๆบ็š„้Ÿณ่ดจ้ซ˜ไบŽ่ฎญ็ปƒ้›†็š„้Ÿณ่ดจ๏ผŒไป–ไปฌๅฏไปฅๅธฆ้ซ˜ๆŽจ็†็ป“ๆžœ็š„้Ÿณ่ดจ๏ผŒไฝ†ไปฃไปทๅฏ่ƒฝๆ˜ฏ้Ÿณ่‰ฒๅพ€ๅบ•ๆจก/ๆŽจ็†ๆบ็š„้Ÿณ่‰ฒ้ ๏ผŒ่ฟ™็ง็Žฐ่ฑกๅซๅš"้Ÿณ่‰ฒๆณ„้œฒ"๏ผ›<br>
68
+ โ€ƒโ€ƒindex rate็”จๆฅๅ‰Šๅ‡/่งฃๅ†ณ้Ÿณ่‰ฒๆณ„้œฒ้—ฎ้ข˜ใ€‚่ฐƒๅˆฐ1๏ผŒๅˆ™็†่ฎบไธŠไธๅญ˜ๅœจๆŽจ็†ๆบ็š„้Ÿณ่‰ฒๆณ„้œฒ้—ฎ้ข˜๏ผŒไฝ†้Ÿณ่ดจๆ›ดๅ€พๅ‘ไบŽ่ฎญ็ปƒ้›†ใ€‚ๅฆ‚ๆžœ่ฎญ็ปƒ้›†้Ÿณ่ดจๆฏ”ๆŽจ็†ๆบไฝŽ๏ผŒๅˆ™index rate่ฐƒ้ซ˜ๅฏ่ƒฝ้™ไฝŽ้Ÿณ่ดจใ€‚่ฐƒๅˆฐ0๏ผŒๅˆ™ไธๅ…ทๅค‡ๅˆฉ็”จๆฃ€็ดขๆททๅˆๆฅไฟๆŠค่ฎญ็ปƒ้›†้Ÿณ่‰ฒ็š„ๆ•ˆๆžœ๏ผ›<br>
69
+ โ€ƒโ€ƒๅฆ‚ๆžœ่ฎญ็ปƒ้›†ไผ˜่ดจๆ—ถ้•ฟๅคš๏ผŒๅฏ่ฐƒ้ซ˜total_epoch๏ผŒๆญคๆ—ถๆจกๅž‹ๆœฌ่บซไธๅคชไผšๅผ•็”จๆŽจ็†ๆบๅ’Œๅบ•ๆจก็š„้Ÿณ่‰ฒ๏ผŒๅพˆๅฐ‘ๅญ˜ๅœจ"้Ÿณ่‰ฒๆณ„้œฒ"้—ฎ้ข˜๏ผŒๆญคๆ—ถindex_rateไธ้‡่ฆ๏ผŒไฝ ็”š่‡ณๅฏไปฅไธๅปบ็ซ‹/ๅˆ†ไบซindex็ดขๅผ•ๆ–‡ไปถใ€‚<br>
70
+
71
+ ## Q11:ๆŽจ็†ๆ€Žไนˆ้€‰gpu
72
+ config.pyๆ–‡ไปถ้‡Œdevice cuda:ๅŽ้ข้€‰ๆ‹ฉๅกๅท๏ผ›<br>
73
+ ๅกๅทๅ’Œๆ˜พๅก็š„ๆ˜ ๅฐ„ๅ…ณ็ณป๏ผŒๅœจ่ฎญ็ปƒ้€‰้กนๅก็š„ๆ˜พๅกไฟกๆฏๆ ้‡Œ่ƒฝ็œ‹ๅˆฐใ€‚<br>
74
+
75
+ ## Q12:ๅฆ‚ไฝ•ๆŽจ็†่ฎญ็ปƒไธญ้—ดไฟๅญ˜็š„pth
76
+ ้€š่ฟ‡ckpt้€‰้กนๅกๆœ€ไธ‹้ขๆๅ–ๅฐๆจกๅž‹ใ€‚<br>
77
+
78
+
79
+ ## Q13:ๅฆ‚ไฝ•ไธญๆ–ญๅ’Œ็ปง็ปญ่ฎญ็ปƒ
80
+ ็Žฐ้˜ถๆฎตๅช่ƒฝๅ…ณ้—ญWebUIๆŽงๅˆถๅฐๅŒๅ‡ปgo-web.bat้‡ๅฏ็จ‹ๅบใ€‚็ฝ‘้กตๅ‚ๆ•ฐไนŸ่ฆๅˆทๆ–ฐ้‡ๆ–ฐๅกซๅ†™๏ผ›<br>
81
+ ็ปง็ปญ่ฎญ็ปƒ๏ผš็›ธๅŒ็ฝ‘้กตๅ‚ๆ•ฐ็‚น่ฎญ็ปƒๆจกๅž‹๏ผŒๅฐฑไผšๆŽฅ็€ไธŠๆฌก็š„checkpoint็ปง็ปญ่ฎญ็ปƒใ€‚<br>
82
+
83
+ ## Q14:่ฎญ็ปƒๆ—ถๅ‡บ็Žฐๆ–‡ไปถ้กต้ข/ๅ†…ๅญ˜error
84
+ ่ฟ›็จ‹ๅผ€ๅคชๅคšไบ†๏ผŒๅ†…ๅญ˜็‚ธไบ†ใ€‚ไฝ ๅฏ่ƒฝๅฏไปฅ้€š่ฟ‡ๅฆ‚ไธ‹ๆ–นๅผ่งฃๅ†ณ<br>
85
+ 1ใ€"ๆๅ–้Ÿณ้ซ˜ๅ’Œๅค„็†ๆ•ฐๆฎไฝฟ็”จ็š„CPU่ฟ›็จ‹ๆ•ฐ" ้…Œๆƒ…ๆ‹‰ไฝŽ๏ผ›<br>
86
+ 2ใ€่ฎญ็ปƒ้›†้Ÿณ้ข‘ๆ‰‹ๅทฅๅˆ‡ไธ€ไธ‹๏ผŒไธ่ฆๅคช้•ฟใ€‚<br>
87
+
88
+
89
+
docs/faq_en.md ADDED
@@ -0,0 +1,95 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Q1:ffmpeg error/utf8 error.
2
+ It is most likely not a FFmpeg issue, but rather an audio path issue;
3
+
4
+ FFmpeg may encounter an error when reading paths containing special characters like spaces and (), which may cause an FFmpeg error; and when the training set's audio contains Chinese paths, writing it into filelist.txt may cause a utf8 error.<br>
5
+
6
+ ## Q2:Cannot find index file after "One-click Training".
7
+ If it displays "Training is done. The program is closed," then the model has been trained successfully, and the subsequent errors are fake;
8
+
9
+ The lack of an 'added' index file after One-click training may be due to the training set being too large, causing the addition of the index to get stuck; this has been resolved by using batch processing to add the index, which solves the problem of memory overload when adding the index. As a temporary solution, try clicking the "Train Index" button again.<br>
10
+
11
+ ## Q3:Cannot find the model in โ€œInferencing timbreโ€ after training
12
+ Click โ€œRefresh timbre listโ€ and check again; if still not visible, check if there are any errors during training and send screenshots of the console, web UI, and logs/experiment_name/*.log to the developers for further analysis.<br>
13
+
14
+ ## Q4:How to share a model/How to use others' models?
15
+ The pth files stored in rvc_root/logs/experiment_name are not meant for sharing or inference, but for storing the experiment checkpoits for reproducibility and further training. The model to be shared should be the 60+MB pth file in the weights folder;
16
+
17
+ In the future, weights/exp_name.pth and logs/exp_name/added_xxx.index will be merged into a single weights/exp_name.zip file to eliminate the need for manual index input; so share the zip file, not the pth file, unless you want to continue training on a different machine;
18
+
19
+ Copying/sharing the several hundred MB pth files from the logs folder to the weights folder for forced inference may result in errors such as missing f0, tgt_sr, or other keys. You need to use the ckpt tab at the bottom to manually or automatically (if the information is found in the logs/exp_name), select whether to include pitch infomation and target audio sampling rate options and then extract the smaller model. After extraction, there will be a 60+ MB pth file in the weights folder, and you can refresh the voices to use it.<br>
20
+
21
+ ## Q5:Connection Error.
22
+ You may have closed the console (black command line window).<br>
23
+
24
+ ## Q6:WebUI popup 'Expecting value: line 1 column 1 (char 0)'.
25
+ Please disable system LAN proxy/global proxy and then refresh.<br>
26
+
27
+ ## Q7:How to train and infer without the WebUI?
28
+ Training script:<br>
29
+ You can run training in WebUI first, and the command-line versions of dataset preprocessing and training will be displayed in the message window.<br>
30
+
31
+ Inference script:<br>
32
+ https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/myinfer.py<br>
33
+
34
+
35
+ e.g.<br>
36
+
37
+ runtime\python.exe myinfer.py 0 "E:\codes\py39\RVC-beta\todo-songs\1111.wav" "E:\codes\py39\logs\mi-test\added_IVF677_Flat_nprobe_7.index" harvest "test.wav" "weights/mi-test.pth" 0.6 cuda:0 True<br>
38
+
39
+
40
+ f0up_key=sys.argv[1]<br>
41
+ input_path=sys.argv[2]<br>
42
+ index_path=sys.argv[3]<br>
43
+ f0method=sys.argv[4]#harvest or pm<br>
44
+ opt_path=sys.argv[5]<br>
45
+ model_path=sys.argv[6]<br>
46
+ index_rate=float(sys.argv[7])<br>
47
+ device=sys.argv[8]<br>
48
+ is_half=bool(sys.argv[9])<br>
49
+
50
+ ## Q8:Cuda error/Cuda out of memory.
51
+ There is a small chance that there is a problem with the CUDA configuration or the device is not supported; more likely, there is not enough memory (out of memory).<br>
52
+
53
+ For training, reduce the batch size (if reducing to 1 is still not enough, you may need to change the graphics card); for inference, adjust the x_pad, x_query, x_center, and x_max settings in the config.py file as needed. 4G or lower memory cards (e.g. 1060(3G) and various 2G cards) can be abandoned, while 4G memory cards still have a chance.<br>
54
+
55
+ ## Q9:How many total_epoch are optimal?
56
+ If the training dataset's audio quality is poor and the noise floor is high, 20-30 epochs are sufficient. Setting it too high won't improve the audio quality of your low-quality training set.<br>
57
+
58
+ If the training set audio quality is high, the noise floor is low, and there is sufficient duration, you can increase it. 200 is acceptable (since training is fast, and if you're able to prepare a high-quality training set, your GPU likely can handle a longer training duration without issue).<br>
59
+
60
+ ## Q10:How much training set duration is needed?
61
+
62
+ A dataset of around 10min to 50min is recommended.<br>
63
+
64
+ With guaranteed high sound quality and low bottom noise, more can be added if the dataset's timbre is uniform.<br>
65
+
66
+ For a high-level training set (lean + distinctive tone), 5min to 10min is fine.<br>
67
+
68
+ There are some people who have trained successfully with 1min to 2min data, but the success is not reproducible by others and is not very informative. <br>This requires that the training set has a very distinctive timbre (e.g. a high-frequency airy anime girl sound) and the quality of the audio is high;
69
+ Data of less than 1min duration has not been successfully attempted so far. This is not recommended.<br>
70
+
71
+
72
+ ## Q11:What is the index rate for and how to adjust it?
73
+ If the tone quality of the pre-trained model and inference source is higher than that of the training set, they can bring up the tone quality of the inference result, but at the cost of a possible tone bias towards the tone of the underlying model/inference source rather than the tone of the training set, which is generally referred to as "tone leakage".<br>
74
+
75
+ The index rate is used to reduce/resolve the timbre leakage problem. If the index rate is set to 1, theoretically there is no timbre leakage from the inference source and the timbre quality is more biased towards the training set. If the training set has a lower sound quality than the inference source, then a higher index rate may reduce the sound quality. Turning it down to 0 does not have the effect of using retrieval blending to protect the training set tones.<br>
76
+
77
+ If the training set has good audio quality and long duration, turn up the total_epoch, when the model itself is less likely to refer to the inferred source and the pretrained underlying model, and there is little "tone leakage", the index_rate is not important and you can even not create/share the index file.<br>
78
+
79
+ ## Q12:How to choose the gpu when inferring?
80
+ In the config.py file, select the card number after "device cuda:".<br>
81
+
82
+ The mapping between card number and graphics card can be seen in the graphics card information section of the training tab.<br>
83
+
84
+ ## Q13:How to use the model saved in the middle of training?
85
+ Save via model extraction at the bottom of the ckpt processing tab.
86
+
87
+ ## Q14:File/memory error(when training)?
88
+ Too many processes and your memory is not enough. You may fix it by:
89
+
90
+ 1ใ€decrease the input in field "Threads of CPU".
91
+
92
+ 2ใ€pre-cut trainset to shorter audio files.
93
+
94
+
95
+
docs/training_tips_en.md ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Instructions and tips for RVC training
2
+ ======================================
3
+ This TIPS explains how data training is done.
4
+
5
+ # Training flow
6
+ I will explain along the steps in the training tab of the GUI.
7
+
8
+ ## step1
9
+ Set the experiment name here.
10
+
11
+ You can also set here whether the model should take pitch into account.
12
+ If the model doesn't consider pitch, the model will be lighter, but not suitable for singing.
13
+
14
+ Data for each experiment is placed in `/logs/your-experiment-name/`.
15
+
16
+ ## step2a
17
+ Loads and preprocesses audio.
18
+
19
+ ### load audio
20
+ If you specify a folder with audio, the audio files in that folder will be read automatically.
21
+ For example, if you specify `C:Users\hoge\voices`, `C:Users\hoge\voices\voice.mp3` will be loaded, but `C:Users\hoge\voices\dir\voice.mp3` will Not loaded.
22
+
23
+ Since ffmpeg is used internally for reading audio, if the extension is supported by ffmpeg, it will be read automatically.
24
+ After converting to int16 with ffmpeg, convert to float32 and normalize between -1 to 1.
25
+
26
+ ### denoising
27
+ The audio is smoothed by scipy's filtfilt.
28
+
29
+ ### Audio Split
30
+ First, the input audio is divided by detecting parts of silence that last longer than a certain period (max_sil_kept=5 seconds?). After splitting the audio on silence, split the audio every 4 seconds with an overlap of 0.3 seconds. For audio separated within 4 seconds, after normalizing the volume, convert the wav file to `/logs/your-experiment-name/0_gt_wavs` and then convert it to 16k sampling rate to `/logs/your-experiment-name/1_16k_wavs ` as a wav file.
31
+
32
+ ## step2b
33
+ ### Extract pitch
34
+ Extract pitch information from wav files. Extract the pitch information (=f0) using the method built into parselmouth or pyworld and save it in `/logs/your-experiment-name/2a_f0`. Then logarithmically convert the pitch information to an integer between 1 and 255 and save it in `/logs/your-experiment-name/2b-f0nsf`.
35
+
36
+ ### Extract feature_print
37
+ Convert the wav file to embedding in advance using HuBERT. Read the wav file saved in `/logs/your-experiment-name/1_16k_wavs`, convert the wav file to 256-dimensional features with HuBERT, and save in npy format in `/logs/your-experiment-name/3_feature256`.
38
+
39
+ ## step3
40
+ train the model.
41
+ ### Glossary for Beginners
42
+ In deep learning, the data set is divided and the learning proceeds little by little. In one model update (step), batch_size data are retrieved and predictions and error corrections are performed. Doing this once for a dataset counts as one epoch.
43
+
44
+ Therefore, the learning time is the learning time per step x (the number of data in the dataset / batch size) x the number of epochs. In general, the larger the batch size, the more stable the learning becomes (learning time per step รท batch size) becomes smaller, but it uses more GPU memory. GPU RAM can be checked with the nvidia-smi command. Learning can be done in a short time by increasing the batch size as much as possible according to the machine of the execution environment.
45
+
46
+ ### Specify pretrained model
47
+ RVC starts training the model from pretrained weights instead of from 0, so it can be trained with a small dataset.
48
+
49
+ By default
50
+
51
+ - If you consider pitch, it loads `rvc-location/pretrained/f0G40k.pth` and `rvc-location/pretrained/f0D40k.pth`.
52
+ - If you don't consider pitch, it loads `rvc-location/pretrained/f0G40k.pth` and `rvc-location/pretrained/f0D40k.pth`.
53
+
54
+ When learning, model parameters are saved in `logs/your-experiment-name/G_{}.pth` and `logs/your-experiment-name/D_{}.pth` for each save_every_epoch, but by specifying this path, you can start learning. You can restart or start training from model weights learned in a different experiment.
55
+
56
+ ### learning index
57
+ RVC saves the HuBERT feature values used during training, and during inference, searches for feature values that are similar to the feature values used during learning to perform inference. In order to perform this search at high speed, the index is learned in advance.
58
+ For index learning, we use the approximate neighborhood search library faiss. Read the feature value of `logs/your-experiment-name/3_feature256` and use it to learn the index, and save it as `logs/your-experiment-name/add_XXX.index`.
59
+
60
+ (From the 20230428update version, it is read from the index, and saving / specifying is no longer necessary.)
61
+
62
+ ### Button description
63
+ - Train model: After executing step2b, press this button to train the model.
64
+ - Train feature index: After training the model, perform index learning.
65
+ - One-click training: step2b, model training and feature index training all at once.
docs/training_tips_ja.md ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ RVCใฎ่จ“็ทดใซใŠใ‘ใ‚‹่ชฌๆ˜Žใ€ใŠใ‚ˆใณTIPS
2
+ ===============================
3
+ ๆœฌTIPSใงใฏใฉใฎใ‚ˆใ†ใซใƒ‡ใƒผใ‚ฟใฎ่จ“็ทดใŒ่กŒใ‚ใ‚Œใฆใ„ใ‚‹ใ‹ใ‚’่ชฌๆ˜Žใ—ใพใ™ใ€‚
4
+
5
+ # ่จ“็ทดใฎๆตใ‚Œ
6
+ GUIใฎ่จ“็ทดใ‚ฟใƒ–ใฎstepใซๆฒฟใฃใฆ่ชฌๆ˜Žใ—ใพใ™ใ€‚
7
+
8
+ ## step1
9
+ ๅฎŸ้จ“ๅใฎ่จญๅฎšใ‚’่กŒใ„ใพใ™ใ€‚
10
+
11
+ ใพใŸใ€ใƒขใƒ‡ใƒซใซ้Ÿณ้ซ˜ใ‚ฌใ‚คใƒ‰(ใƒ”ใƒƒใƒ)ใ‚’่€ƒๆ…ฎใ•ใ›ใ‚‹ใ‹ใ‚‚ใ“ใ“ใง่จญๅฎšใงใใพใ™ใ€‚่€ƒๆ…ฎใ•ใ›ใชใ„ๅ ดๅˆใฏใƒขใƒ‡ใƒซใฏ่ปฝ้‡ใซใชใ‚Šใพใ™ใŒใ€ๆญŒๅ”ฑใซใฏๅ‘ใ‹ใชใใชใ‚Šใพใ™ใ€‚
12
+
13
+ ๅ„ๅฎŸ้จ“ใฎใƒ‡ใƒผใ‚ฟใฏ`/logs/ๅฎŸ้จ“ๅ/`ใซ้…็ฝฎใ•ใ‚Œใพใ™ใ€‚
14
+
15
+ ## step2a
16
+ ้Ÿณๅฃฐใฎ่ชญใฟ่พผใฟใจๅ‰ๅ‡ฆ็†ใ‚’่กŒใ„ใพใ™ใ€‚
17
+
18
+ ### load audio
19
+ ้Ÿณๅฃฐใฎใ‚ใ‚‹ใƒ•ใ‚ฉใƒซใƒ€ใ‚’ๆŒ‡ๅฎšใ™ใ‚‹ใจใ€ใใฎใƒ•ใ‚ฉใƒซใƒ€ๅ†…ใซใ‚ใ‚‹้Ÿณๅฃฐใƒ•ใ‚กใ‚คใƒซใ‚’่‡ชๅ‹•ใง่ชญใฟ่พผใฟใพใ™ใ€‚
20
+ ไพ‹ใˆใฐ`C:Users\hoge\voices`ใ‚’ๆŒ‡ๅฎšใ—ใŸๅ ดๅˆใ€`C:Users\hoge\voices\voice.mp3`ใฏ่ชญใฟ่พผใพใ‚Œใพใ™ใŒใ€`C:Users\hoge\voices\dir\voice.mp3`ใฏ่ชญใฟ่พผใพใ‚Œใพใ›ใ‚“ใ€‚
21
+
22
+ ้Ÿณๅฃฐใฎ่ชญใฟ่พผใฟใซใฏๅ†…้ƒจใงffmpegใ‚’ๅˆฉ็”จใ—ใฆใ„ใ‚‹ใฎใงใ€ffmpegใงๅฏพๅฟœใ—ใฆใ„ใ‚‹ๆ‹กๅผตๅญใงใ‚ใ‚Œใฐ่‡ชๅ‹•็š„ใซ่ชญใฟ่พผใพใ‚Œใพใ™ใ€‚
23
+ ffmpegใงint16ใซๅค‰ๆ›ใ—ใŸๅพŒใ€float32ใซๅค‰ๆ›ใ—ใ€-1 ~ 1ใฎ้–“ใซๆญฃ่ฆๅŒ–ใ•ใ‚Œใพใ™ใ€‚
24
+
25
+ ### denoising
26
+ ้Ÿณๅฃฐใซใคใ„ใฆscipyใฎfiltfiltใซใ‚ˆใ‚‹ๅนณๆป‘ๅŒ–ใ‚’่กŒใ„ใพใ™ใ€‚
27
+
28
+ ### ้Ÿณๅฃฐใฎๅˆ†ๅ‰ฒ
29
+ ๅ…ฅๅŠ›ใ—ใŸ้Ÿณๅฃฐใฏใพใšใ€ไธ€ๅฎšๆœŸ้–“(max_sil_kept=5็ง’?)ใ‚ˆใ‚Š้•ทใ็„ก้ŸณใŒ็ถšใ้ƒจๅˆ†ใ‚’ๆคœ็Ÿฅใ—ใฆ้Ÿณๅฃฐใ‚’ๅˆ†ๅ‰ฒใ—ใพใ™ใ€‚็„ก้Ÿณใง้Ÿณๅฃฐใ‚’ๅˆ†ๅ‰ฒใ—ใŸๅพŒใฏใ€0.3็ง’ใฎoverlapใ‚’ๅซใ‚€4็ง’ใ”ใจใซ้Ÿณๅฃฐใ‚’ๅˆ†ๅ‰ฒใ—ใพใ™ใ€‚4็ง’ไปฅๅ†…ใซๅŒบๅˆ‡ใ‚‰ใ‚ŒใŸ้Ÿณๅฃฐใฏใ€้Ÿณ้‡ใฎๆญฃ่ฆๅŒ–ใ‚’่กŒใฃใŸๅพŒwavใƒ•ใ‚กใ‚คใƒซใ‚’`/logs/ๅฎŸ้จ“ๅ/0_gt_wavs`ใซใ€ใใ“ใ‹ใ‚‰16kใฎใ‚ตใƒณใƒ—ใƒชใƒณใ‚ฐใƒฌใƒผใƒˆใซๅค‰ๆ›ใ—ใฆ`/logs/ๅฎŸ้จ“ๅ/1_16k_wavs`ใซwavใƒ•ใ‚กใ‚คใƒซใงไฟๅญ˜ใ—ใพใ™ใ€‚
30
+
31
+ ## step2b
32
+ ### ใƒ”ใƒƒใƒใฎๆŠฝๅ‡บ
33
+ wavใƒ•ใ‚กใ‚คใƒซใ‹ใ‚‰ใƒ”ใƒƒใƒ(้Ÿณใฎ้ซ˜ไฝŽ)ใฎๆƒ…ๅ ฑใ‚’ๆŠฝๅ‡บใ—ใพใ™ใ€‚parselmouthใ‚„pyworldใซๅ†…่”ตใ•ใ‚Œใฆใ„ใ‚‹ๆ‰‹ๆณ•ใงใƒ”ใƒƒใƒๆƒ…ๅ ฑ(=f0)ใ‚’ๆŠฝๅ‡บใ—ใ€`/logs/ๅฎŸ้จ“ๅ/2a_f0`ใซไฟๅญ˜ใ—ใพใ™ใ€‚ใใฎๅพŒใ€ใƒ”ใƒƒใƒๆƒ…ๅ ฑใ‚’ๅฏพๆ•ฐใงๅค‰ๆ›ใ—ใฆ1~255ใฎๆ•ดๆ•ฐใซๅค‰ๆ›ใ—ใ€`/logs/ๅฎŸ้จ“ๅ/2b-f0nsf`ใซไฟๅญ˜ใ—ใพใ™ใ€‚
34
+
35
+ ### feature_printใฎๆŠฝๅ‡บ
36
+ HuBERTใ‚’็”จใ„ใฆwavใƒ•ใ‚กใ‚คใƒซใ‚’ไบ‹ๅ‰ใซembeddingใซๅค‰ๆ›ใ—ใพใ™ใ€‚`/logs/ๅฎŸ้จ“ๅ/1_16k_wavs`ใซไฟๅญ˜ใ—ใŸwavใƒ•ใ‚กใ‚คใƒซใ‚’่ชญใฟ่พผใฟใ€HuBERTใงwavใƒ•ใ‚กใ‚คใƒซใ‚’256ๆฌกๅ…ƒใฎ็‰นๅพด้‡ใซๅค‰ๆ›ใ—ใ€npyๅฝขๅผใง`/logs/ๅฎŸ้จ“ๅ/3_feature256`ใซไฟๅญ˜ใ—ใพใ™ใ€‚
37
+
38
+ ## step3
39
+ ใƒขใƒ‡ใƒซใฎใƒˆใƒฌใƒผใƒ‹ใƒณใ‚ฐใ‚’่กŒใ„ใพใ™ใ€‚
40
+ ### ๅˆๅฟƒ่€…ๅ‘ใ‘็”จ่ชž่งฃ่ชฌ
41
+ ๆทฑๅฑคๅญฆ็ฟ’ใงใฏใƒ‡ใƒผใ‚ฟใ‚ปใƒƒใƒˆใ‚’ๅˆ†ๅ‰ฒใ—ใ€ๅฐ‘ใ—ใšใคๅญฆ็ฟ’ใ‚’้€ฒใ‚ใฆใ„ใใพใ™ใ€‚ไธ€ๅ›žใฎใƒขใƒ‡ใƒซใฎๆ›ดๆ–ฐ(step)ใงใฏใ€batch_sizeๅ€‹ใฎใƒ‡ใƒผใ‚ฟใ‚’ๅ–ใ‚Šๅ‡บใ—ไบˆๆธฌใจ่ชคๅทฎใฎไฟฎๆญฃใ‚’่กŒใ„ใพใ™ใ€‚ใ“ใ‚Œใ‚’ใƒ‡ใƒผใ‚ฟใ‚ปใƒƒใƒˆใซๅฏพใ—ใฆไธ€้€šใ‚Š่กŒใ†ใจไธ€epochใจๆ•ฐใˆใพใ™ใ€‚
42
+
43
+ ใใฎใŸใ‚ใ€ๅญฆ็ฟ’ๆ™‚้–“ใฏ 1stepๅฝ“ใŸใ‚Šใฎๅญฆ็ฟ’ๆ™‚้–“ x (ใƒ‡ใƒผใ‚ฟใ‚ปใƒƒใƒˆๅ†…ใฎใƒ‡ใƒผใ‚ฟๆ•ฐ รท ใƒใƒƒใƒใ‚ตใ‚คใ‚บ) x epochๆ•ฐ ใ‹ใ‹ใ‚Šใพใ™ใ€‚ไธ€่ˆฌใซใƒใƒƒใƒใ‚ตใ‚คใ‚บใ‚’ๅคงใใใ™ใ‚‹ใปใฉๅญฆ็ฟ’ใฏๅฎ‰ๅฎšใ—ใ€(1stepๅฝ“ใŸใ‚Šใฎๅญฆ็ฟ’ๆ™‚้–“รทใƒใƒƒใƒใ‚ตใ‚คใ‚บ)ใฏๅฐใ•ใใชใ‚Šใพใ™ใŒใ€ใใฎๅˆ†GPUใฎใƒกใƒขใƒชใ‚’ๅคšใไฝฟ็”จใ—ใพใ™ใ€‚GPUใฎRAMใฏnvidia-smiใ‚ณใƒžใƒณใƒ‰็ญ‰ใง็ขบ่ชใงใใพใ™ใ€‚ๅฎŸ่กŒ็’ฐๅขƒใฎใƒžใ‚ทใƒณใซๅˆใ‚ใ›ใฆใƒใƒƒใƒใ‚ตใ‚คใ‚บใ‚’ใงใใ‚‹ใ ใ‘ๅคงใใใ™ใ‚‹ใจใ‚ˆใ‚Š็Ÿญๆ™‚้–“ใงๅญฆ็ฟ’ใŒๅฏ่ƒฝใงใ™ใ€‚
44
+
45
+ ### pretrained modelใฎๆŒ‡ๅฎš
46
+ RVCใงใฏใƒขใƒ‡ใƒซใฎ่จ“็ทดใ‚’0ใ‹ใ‚‰ใงใฏใชใใ€ไบ‹ๅ‰ๅญฆ็ฟ’ๆธˆใฟใฎ้‡ใฟใ‹ใ‚‰้–‹ๅง‹ใ™ใ‚‹ใŸใ‚ใ€ๅฐ‘ใชใ„ใƒ‡ใƒผใ‚ฟใ‚ปใƒƒใƒˆใงๅญฆ็ฟ’ใ‚’่กŒใˆใพใ™ใ€‚
47
+
48
+ ใƒ‡ใƒ•ใ‚ฉใƒซใƒˆใงใฏ
49
+
50
+ - ้Ÿณ้ซ˜ใ‚ฌใ‚คใƒ‰ใ‚’่€ƒๆ…ฎใ™ใ‚‹ๅ ดๅˆใ€`RVCใฎใ‚ใ‚‹ๅ ดๆ‰€/pretrained/f0G40k.pth`ใจ`RVCใฎใ‚ใ‚‹ๅ ดๆ‰€/pretrained/f0D40k.pth`ใ‚’่ชญใฟ่พผใฟใพใ™ใ€‚
51
+ - ้Ÿณ้ซ˜ใ‚ฌใ‚คใƒ‰ใ‚’่€ƒๆ…ฎใ—ใชใ„ๅ ดๅˆใ€`RVCใฎใ‚ใ‚‹ๅ ดๆ‰€/pretrained/G40k.pth`ใจ`RVCใฎใ‚ใ‚‹ๅ ดๆ‰€/pretrained/D40k.pth`ใ‚’่ชญใฟ่พผใฟใพใ™ใ€‚
52
+
53
+ ๅญฆ็ฟ’ๆ™‚ใฏsave_every_epochใ”ใจใซใƒขใƒ‡ใƒซใฎใƒ‘ใƒฉใƒกใƒผใ‚ฟใŒ`logs/ๅฎŸ้จ“ๅ/G_{}.pth`ใจ`logs/ๅฎŸ้จ“ๅ/D_{}.pth`ใซไฟๅญ˜ใ•ใ‚Œใพใ™ใŒใ€ใ“ใฎใƒ‘ใ‚นใ‚’ๆŒ‡ๅฎšใ™ใ‚‹ใ“ใจใงๅญฆ็ฟ’ใ‚’ๅ†้–‹ใ—ใŸใ‚Šใ€ใ‚‚ใ—ใใฏ้•ใ†ๅฎŸ้จ“ใงๅญฆ็ฟ’ใ—ใŸใƒขใƒ‡ใƒซใฎ้‡ใฟใ‹ใ‚‰ๅญฆ็ฟ’ใ‚’้–‹ๅง‹ใงใใพใ™ใ€‚
54
+
55
+ ### indexใฎๅญฆ็ฟ’
56
+ RVCใงใฏๅญฆ็ฟ’ๆ™‚ใซไฝฟใ‚ใ‚ŒใŸHuBERTใฎ็‰นๅพด้‡ใ‚’ไฟๅญ˜ใ—ใ€ๆŽจ่ซ–ๆ™‚ใฏๅญฆ็ฟ’ๆ™‚ใฎ็‰นๅพด้‡ใ‹ใ‚‰่ฟ‘ใ„็‰นๅพด้‡ใ‚’ๆŽขใ—ใฆใใฆๆŽจ่ซ–ใ‚’่กŒใ„ใพใ™ใ€‚ใ“ใฎๆคœ็ดขใ‚’้ซ˜้€Ÿใซ่กŒใ†ใŸใ‚ใซไบ‹ๅ‰ใซindexใฎๅญฆ็ฟ’ใ‚’่กŒใ„ใพใ™ใ€‚
57
+ indexใฎๅญฆ็ฟ’ใซใฏ่ฟ‘ไผผ่ฟ‘ๅ‚ๆŽข็ดขใƒฉใ‚คใƒ–ใƒฉใƒชใฎfaissใ‚’็”จใ„ใพใ™ใ€‚`/logs/ๅฎŸ้จ“ๅ/3_feature256`ใฎ็‰นๅพด้‡ใ‚’่ชญใฟ่พผใฟใ€ใใ‚Œใ‚’็”จใ„ใฆๅญฆ็ฟ’ใ—ใŸindexใ‚’`/logs/ๅฎŸ้จ“ๅ/add_XXX.index`ใจใ—ใฆไฟๅญ˜ใ—ใพใ™ใ€‚
58
+ (20230428updateใ‚ˆใ‚Štotal_fea.npyใฏindexใ‹ใ‚‰่ชญใฟ่พผใ‚€ใฎใงไธ่ฆใซใชใ‚Šใพใ—ใŸใ€‚)
59
+
60
+ ### ใƒœใ‚ฟใƒณใฎ่ชฌๆ˜Ž
61
+ - ใƒขใƒ‡ใƒซใฎใƒˆใƒฌใƒผใƒ‹ใƒณใ‚ฐ: step2bใพใงใ‚’ๅฎŸ่กŒใ—ใŸๅพŒใ€ใ“ใฎใƒœใ‚ฟใƒณใ‚’ๆŠผใ™ใจใƒขใƒ‡ใƒซใฎๅญฆ็ฟ’ใ‚’่กŒใ„ใพใ™ใ€‚
62
+ - ็‰นๅพดใ‚คใƒณใƒ‡ใƒƒใ‚ฏใ‚นใฎใƒˆใƒฌใƒผใƒ‹ใƒณใ‚ฐ: ใƒขใƒ‡ใƒซใฎใƒˆใƒฌใƒผใƒ‹ใƒณใ‚ฐๅพŒใ€indexใฎๅญฆ็ฟ’ใ‚’่กŒใ„ใพใ™ใ€‚
63
+ - ใƒฏใƒณใ‚ฏใƒชใƒƒใ‚ฏใƒˆใƒฌใƒผใƒ‹ใƒณใ‚ฐ: step2bใพใงใจใƒขใƒ‡ใƒซใฎใƒˆใƒฌใƒผใƒ‹ใƒณใ‚ฐใ€็‰นๅพดใ‚คใƒณใƒ‡ใƒƒใ‚ฏใ‚นใฎใƒˆใƒฌใƒผใƒ‹ใƒณใ‚ฐใ‚’ไธ€ๆ‹ฌใง่กŒใ„ใพใ™ใ€‚
64
+
docs/training_tips_ko.md ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ RVC ํ›ˆ๋ จ์— ๋Œ€ํ•œ ์„ค๋ช…๊ณผ ํŒ๋“ค
2
+ ======================================
3
+ ๋ณธ ํŒ์—์„œ๋Š” ์–ด๋–ป๊ฒŒ ๋ฐ์ดํ„ฐ ํ›ˆ๋ จ์ด ์ด๋ฃจ์–ด์ง€๊ณ  ์žˆ๋Š”์ง€ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.
4
+
5
+ # ํ›ˆ๋ จ์˜ ํ๋ฆ„
6
+ GUI์˜ ํ›ˆ๋ จ ํƒญ์˜ ๋‹จ๊ณ„๋ฅผ ๋”ฐ๋ผ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.
7
+
8
+ ## step1
9
+ ์‹คํ—˜ ์ด๋ฆ„์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ, ๋ชจ๋ธ์ด ํ”ผ์น˜(์†Œ๋ฆฌ์˜ ๋†’๋‚ฎ์ด)๋ฅผ ๊ณ ๋ คํ•ด์•ผ ํ•˜๋Š”์ง€ ์—ฌ๋ถ€๋ฅผ ์—ฌ๊ธฐ์—์„œ ์„ค์ •ํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค..
10
+ ๊ฐ ์‹คํ—˜์„ ์œ„ํ•œ ๋ฐ์ดํ„ฐ๋Š” `/logs/experiment name/`์— ๋ฐฐ์น˜๋ฉ๋‹ˆ๋‹ค..
11
+
12
+ ## step2a
13
+ ์Œ์„ฑ ํŒŒ์ผ์„ ๋ถˆ๋Ÿฌ์˜ค๊ณ  ์ „์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค.
14
+
15
+ ### ์Œ์„ฑ ํŒŒ์ผ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
16
+ ์Œ์„ฑ ํŒŒ์ผ์ด ์žˆ๋Š” ํด๋”๋ฅผ ์ง€์ •ํ•˜๋ฉด ํ•ด๋‹น ํด๋”์— ์žˆ๋Š” ์Œ์„ฑ ํŒŒ์ผ์ด ์ž๋™์œผ๋กœ ๊ฐ€์ ธ์™€์ง‘๋‹ˆ๋‹ค.
17
+ ์˜ˆ๋ฅผ ๋“ค์–ด `C:Users\hoge\voices`๋ฅผ ์ง€์ •ํ•˜๋ฉด `C:Users\hoge\voices\voice.mp3`๊ฐ€ ์ฝํžˆ์ง€๋งŒ `C:Users\hoge\voices\dir\voice.mp3`๋Š” ์ฝํžˆ์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
18
+
19
+ ์Œ์„ฑ ๋กœ๋“œ์—๋Š” ๋‚ด๋ถ€์ ์œผ๋กœ ffmpeg๋ฅผ ์ด์šฉํ•˜๊ณ  ์žˆ์œผ๋ฏ€๋กœ, ffmpeg๋กœ ๋Œ€์‘ํ•˜๊ณ  ์žˆ๋Š” ํ™•์žฅ์ž๋ผ๋ฉด ์ž๋™์ ์œผ๋กœ ์ฝํž™๋‹ˆ๋‹ค.
20
+ ffmpeg์—์„œ int16์œผ๋กœ ๋ณ€ํ™˜ํ•œ ํ›„ float32๋กœ ๋ณ€ํ™˜ํ•˜๊ณ  -1๊ณผ 1 ์‚ฌ์ด์— ์ •๊ทœํ™”๋ฉ๋‹ˆ๋‹ค.
21
+
22
+ ### ์žก์Œ ์ œ๊ฑฐ
23
+ ์Œ์„ฑ ํŒŒ์ผ์— ๋Œ€ํ•ด scipy์˜ filtfilt๋ฅผ ์ด์šฉํ•˜์—ฌ ์žก์Œ์„ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค.
24
+
25
+ ### ์Œ์„ฑ ๋ถ„ํ• 
26
+ ์ž…๋ ฅํ•œ ์Œ์„ฑ ํŒŒ์ผ์€ ๋จผ์ € ์ผ์ • ๊ธฐ๊ฐ„(max_sil_kept=5์ดˆ?)๋ณด๋‹ค ๊ธธ๊ฒŒ ๋ฌด์Œ์ด ์ง€์†๋˜๋Š” ๋ถ€๋ถ„์„ ๊ฐ์ง€ํ•˜์—ฌ ์Œ์„ฑ์„ ๋ถ„ํ• ํ•ฉ๋‹ˆ๋‹ค.๋ฌด์Œ์œผ๋กœ ์Œ์„ฑ์„ ๋ถ„ํ• ํ•œ ํ›„์—๋Š” 0.3์ดˆ์˜ overlap์„ ํฌํ•จํ•˜์—ฌ 4์ดˆ๋งˆ๋‹ค ์Œ์„ฑ์„ ๋ถ„ํ• ํ•ฉ๋‹ˆ๋‹ค.4์ดˆ ์ด๋‚ด์— ๊ตฌ๋ถ„๋œ ์Œ์„ฑ์€ ์Œ๋Ÿ‰์˜ ์ •๊ทœํ™”๋ฅผ ์‹ค์‹œํ•œ ํ›„ wav ํŒŒ์ผ์„ `/logs/์‹คํ—˜๋ช…/0_gt_wavs`๋กœ, ๊ฑฐ๊ธฐ์—์„œ 16k์˜ ์ƒ˜ํ”Œ๋ง ๋ ˆ์ดํŠธ๋กœ ๋ณ€ํ™˜ํ•ด `/logs/์‹คํ—˜๋ช…/1_16k_wavs`์— wav ํŒŒ์ผ๋กœ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
27
+
28
+ ## step2b
29
+ ### ํ”ผ์น˜ ์ถ”์ถœ
30
+ wav ํŒŒ์ผ์—์„œ ํ”ผ์น˜(์†Œ๋ฆฌ์˜ ๋†’๋‚ฎ์ด) ์ •๋ณด๋ฅผ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค. parselmouth๋‚˜ pyworld์— ๋‚ด์žฅ๋˜์–ด ์žˆ๋Š” ๋ฉ”์„œ๋“œ์œผ๋กœ ํ”ผ์น˜ ์ •๋ณด(=f0)๋ฅผ ์ถ”์ถœํ•ด, `/logs/์‹คํ—˜๋ช…/2a_f0`์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค. ๊ทธ ํ›„ ํ”ผ์น˜ ์ •๋ณด๋ฅผ ๋กœ๊ทธ๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ 1~255 ์ •์ˆ˜๋กœ ๋ณ€ํ™˜ํ•˜๊ณ  `/logs/์‹คํ—˜๋ช…/2b-f0nsf`์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
31
+
32
+ ### feature_print ์ถ”์ถœ
33
+ HuBERT๋ฅผ ์ด์šฉํ•˜์—ฌ wav ํŒŒ์ผ์„ ๋ฏธ๋ฆฌ embedding์œผ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค. `/logs/์‹คํ—˜๋ช…/1_16k_wavs`์— ์ €์žฅํ•œ wav ํŒŒ์ผ์„ ์ฝ๊ณ  HuBERT์—์„œ wav ํŒŒ์ผ์„ 256์ฐจ์› feature๋“ค๋กœ ๋ณ€ํ™˜ํ•œ ํ›„ npy ํ˜•์‹์œผ๋กœ `/logs/์‹คํ—˜๋ช…/3_feature256`์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
34
+
35
+ ## step3
36
+ ๋ชจ๋ธ์˜ ํ›ˆ๋ จ์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
37
+
38
+ ### ์ดˆ๋ณด์ž์šฉ ์šฉ์–ด ํ•ด์„ค
39
+ ์‹ฌ์ธตํ•™์Šต(๋”ฅ๋Ÿฌ๋‹)์—์„œ๋Š” ๋ฐ์ดํ„ฐ์…‹์„ ๋ถ„ํ• ํ•˜์—ฌ ์กฐ๊ธˆ์”ฉ ํ•™์Šต์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.ํ•œ ๋ฒˆ์˜ ๋ชจ๋ธ ์—…๋ฐ์ดํŠธ(step) ๋‹จ๊ณ„ ๋‹น batch_size๊ฐœ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ํƒ์ƒ‰ํ•˜์—ฌ ์˜ˆ์ธก๊ณผ ์˜ค์ฐจ๋ฅผ ์ˆ˜์ •ํ•ฉ๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ์…‹ ์ „๋ถ€์— ๋Œ€ํ•ด ์ด ์ž‘์—…์„ ํ•œ ๋ฒˆ ์ˆ˜ํ–‰ํ•˜๋Š” ์ด๋ฅผ ํ•˜๋‚˜์˜ epoch๋ผ๊ณ  ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
40
+
41
+ ๋”ฐ๋ผ์„œ ํ•™์Šต ์‹œ๊ฐ„์€ ๋‹จ๊ณ„๋‹น ํ•™์Šต ์‹œ๊ฐ„ x (๋ฐ์ดํ„ฐ์…‹ ๋‚ด ๋ฐ์ดํ„ฐ์˜ ์ˆ˜ / batch size) x epoch ์ˆ˜๊ฐ€ ์†Œ์š”๋ฉ๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ batch size๊ฐ€ ํด์ˆ˜๋ก ํ•™์Šต์ด ์•ˆ์ •์ ์ด๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. (step๋‹น ํ•™์Šต ์‹œ๊ฐ„ รท batch size)๋Š” ์ž‘์•„์ง€์ง€๋งŒ GPU ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๋” ๋งŽ์ด ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. GPU RAM์€ nvidia-smi ๋ช…๋ น์–ด๋ฅผ ํ†ตํ•ด ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์‹คํ–‰ ํ™˜๊ฒฝ์— ๋”ฐ๋ผ ๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ ์ตœ๋Œ€ํ•œ ๋Š˜๋ฆฌ๋ฉด ์งง์€ ์‹œ๊ฐ„ ๋‚ด์— ํ•™์Šต์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
42
+
43
+ ### ์‚ฌ์ „ ํ•™์Šต๋œ ๋ชจ๋ธ ์ง€์ •
44
+ RVC๋Š” ์ ์€ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ๋„ ํ›ˆ๋ จ์ด ๊ฐ€๋Šฅํ•˜๋„๋ก ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ๊ฐ€์ค‘์น˜์—์„œ ๋ชจ๋ธ ํ›ˆ๋ จ์„ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ๋ณธ์ ์œผ๋กœ `rvc-location/pretrained/f0G40k.pth` ๋ฐ `rvc-location/pretrained/f0D40k.pth`๋ฅผ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค. ํ•™์Šต์„ ํ•  ์‹œ์—, ๋ชจ๋ธ ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ๊ฐ save_every_epoch๋ณ„๋กœ `logs/experiment name/G_{}.pth` ์™€ `logs/experiment name/D_{}.pth`๋กœ ์ €์žฅ์ด ๋˜๋Š”๋ฐ, ์ด ๊ฒฝ๋กœ๋ฅผ ์ง€์ •ํ•จ์œผ๋กœ์จ ํ•™์Šต์„ ์žฌ๊ฐœํ•˜๊ฑฐ๋‚˜, ๋‹ค๋ฅธ ์‹คํ—˜์—์„œ ํ•™์Šตํ•œ ๋ชจ๋ธ์˜ ๊ฐ€์ค‘์น˜์—์„œ ํ•™์Šต์„ ์‹œ์ž‘ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
45
+
46
+ ### index์˜ ํ•™์Šต
47
+ RVC์—์„œ๋Š” ํ•™์Šต์‹œ์— ์‚ฌ์šฉ๋œ HuBERT์˜ feature๊ฐ’์„ ์ €์žฅํ•˜๊ณ , ์ถ”๋ก  ์‹œ์—๋Š” ํ•™์Šต ์‹œ ์‚ฌ์šฉํ•œ feature๊ฐ’๊ณผ ์œ ์‚ฌํ•œ feature ๊ฐ’์„ ํƒ์ƒ‰ํ•ด ์ถ”๋ก ์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์ด ํƒ์ƒ‰์„ ๊ณ ์†์œผ๋กœ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์ „์— index์„ ํ•™์Šตํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
48
+ Index ํ•™์Šต์—๋Š” ๊ทผ์‚ฌ ๊ทผ์ ‘ ํƒ์ƒ‰๋ฒ• ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ธ Faiss๋ฅผ ์‚ฌ์šฉํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. `/logs/์‹คํ—˜๋ช…/3_feature256`์˜ feature๊ฐ’์„ ๋ถˆ๋Ÿฌ์™€, ์ด๋ฅผ ๋ชจ๋‘ ๊ฒฐํ•ฉ์‹œํ‚จ feature๊ฐ’์„ `/logs/์‹คํ—˜๋ช…/total_fea.npy`๋กœ์„œ ์ €์žฅ, ๊ทธ๊ฒƒ์„ ์‚ฌ์šฉํ•ด ํ•™์Šตํ•œ index๋ฅผ`/logs/์‹คํ—˜๋ช…/add_XXX.index`๋กœ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
49
+
50
+ ### ๋ฒ„ํŠผ ์„ค๋ช…
51
+ - ใƒขใƒ‡ใƒซใฎใƒˆใƒฌใƒผใƒ‹ใƒณใ‚ฐ (๋ชจ๋ธ ํ•™์Šต): step2b๊นŒ์ง€ ์‹คํ–‰ํ•œ ํ›„, ์ด ๋ฒ„ํŠผ์„ ๋ˆŒ๋Ÿฌ ๋ชจ๋ธ์„ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.
52
+ - ็‰นๅพดใ‚คใƒณใƒ‡ใƒƒใ‚ฏใ‚นใฎใƒˆใƒฌใƒผใƒ‹ใƒณใ‚ฐ (ํŠน์ง• ์ง€์ˆ˜ ํ›ˆ๋ จ): ๋ชจ๋ธ์˜ ํ›ˆ๋ จ ํ›„, index๋ฅผ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.
53
+ - ใƒฏใƒณใ‚ฏใƒชใƒƒใ‚ฏใƒˆใƒฌใƒผใƒ‹ใƒณใ‚ฐ (์›ํด๋ฆญ ํŠธ๋ ˆ์ด๋‹): step2b๊นŒ์ง€์˜ ๋ชจ๋ธ ํ›ˆ๋ จ, feature index ํ›ˆ๋ จ์„ ์ผ๊ด„๋กœ ์‹ค์‹œํ•ฉ๋‹ˆ๋‹ค.
docs/ๅฐ็™ฝ็ฎ€ๆ˜“ๆ•™็จ‹.doc ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a6def6895e9f7a9bb9a852fbca05f001c77bb98338b687744142e45f014b9a17
3
+ size 602624