Upload folder using huggingface_hub
Browse files- .gitattributes +1 -0
- docs/README.en.md +108 -0
- docs/README.ja.md +104 -0
- docs/README.ko.han.md +100 -0
- docs/README.ko.md +112 -0
- docs/faiss_tips_en.md +102 -0
- docs/faiss_tips_ja.md +101 -0
- docs/faiss_tips_ko.md +132 -0
- docs/faq.md +89 -0
- docs/faq_en.md +95 -0
- docs/training_tips_en.md +65 -0
- docs/training_tips_ja.md +64 -0
- docs/training_tips_ko.md +53 -0
- docs/ๅฐ็ฝ็ฎๆๆ็จ.doc +3 -0
.gitattributes
CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
+
docs/ๅฐ็ฝ็ฎๆๆ็จ.doc filter=lfs diff=lfs merge=lfs -text
|
docs/README.en.md
ADDED
@@ -0,0 +1,108 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
<div align="center">
|
2 |
+
|
3 |
+
<h1>Retrieval-based-Voice-Conversion-WebUI</h1>
|
4 |
+
An easy-to-use Voice Conversion framework based on VITS.<br><br>
|
5 |
+
|
6 |
+
[](https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI)
|
7 |
+
|
8 |
+
<img src="https://counter.seku.su/cmoe?name=rvc&theme=r34" /><br>
|
9 |
+
|
10 |
+
[](https://colab.research.google.com/github/liujing04/Retrieval-based-Voice-Conversion-WebUI/blob/main/Retrieval_based_Voice_Conversion_WebUI.ipynb)
|
11 |
+
[](https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI/blob/main/%E4%BD%BF%E7%94%A8%E9%9C%80%E9%81%B5%E5%AE%88%E7%9A%84%E5%8D%8F%E8%AE%AE-LICENSE.txt)
|
12 |
+
[](https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main/)
|
13 |
+
|
14 |
+
[](https://discord.gg/HcsmBBGyVk)
|
15 |
+
|
16 |
+
</div>
|
17 |
+
|
18 |
+
------
|
19 |
+
[**Changelog**](https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI/blob/main/Changelog_CN.md) | [**FAQ (Frequently Asked Questions)**](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/wiki/FAQ-(Frequently-Asked-Questions))
|
20 |
+
|
21 |
+
[**English**](./README.en.md) | [**ไธญๆ็ฎไฝ**](../README.md) | [**ๆฅๆฌ่ช**](./README.ja.md) | [**ํ๊ตญ์ด**](./README.ko.md) ([**้ๅ่ช**](./README.ko.han.md))
|
22 |
+
|
23 |
+
:fire: A online demo using RVC that convert Vocal to Acoustic Guitar audio:fire: ๏ผhttps://huggingface.co/spaces/lj1995/vocal2guitar
|
24 |
+
|
25 |
+
:fire: Vocal2Guitar demo video:fire: ๏ผhttps://www.bilibili.com/video/BV19W4y1D7tT/
|
26 |
+
|
27 |
+
> Check our [Demo Video](https://www.bilibili.com/video/BV1pm4y1z7Gm/) here!
|
28 |
+
|
29 |
+
> Realtime Voice Conversion Software using RVC : [w-okada/voice-changer](https://github.com/w-okada/voice-changer)
|
30 |
+
|
31 |
+
> The dataset for the pre-training model uses nearly 50 hours of high quality VCTK open source dataset.
|
32 |
+
|
33 |
+
> High quality licensed song datasets will be added to training-set one after another for your use, without worrying about copyright infringement.
|
34 |
+
## Summary
|
35 |
+
This repository has the following features:
|
36 |
+
+ Reduce tone leakage by replacing source feature to training-set feature using top1 retrieval;
|
37 |
+
+ Easy and fast training, even on relatively poor graphics cards;
|
38 |
+
+ Training with a small amount of data also obtains relatively good results (>=10min low noise speech recommended);
|
39 |
+
+ Supporting model fusion to change timbres (using ckpt processing tab->ckpt merge);
|
40 |
+
+ Easy-to-use Webui interface;
|
41 |
+
+ Use the UVR5 model to quickly separate vocals and instruments.
|
42 |
+
## Preparing the environment
|
43 |
+
We recommend you install the dependencies through poetry.
|
44 |
+
|
45 |
+
The following commands need to be executed in the environment of Python version 3.8 or higher:
|
46 |
+
```bash
|
47 |
+
# Install PyTorch-related core dependencies, skip if installed
|
48 |
+
# Reference: https://pytorch.org/get-started/locally/
|
49 |
+
pip install torch torchvision torchaudio
|
50 |
+
|
51 |
+
#For Windows + Nvidia Ampere Architecture(RTX30xx), you need to specify the cuda version corresponding to pytorch according to the experience of https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI/issues/21
|
52 |
+
#pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
|
53 |
+
|
54 |
+
# Install the Poetry dependency management tool, skip if installed
|
55 |
+
# Reference: https://python-poetry.org/docs/#installation
|
56 |
+
curl -sSL https://install.python-poetry.org | python3 -
|
57 |
+
|
58 |
+
# Install the project dependencies
|
59 |
+
poetry install
|
60 |
+
```
|
61 |
+
You can also use pip to install the dependencies
|
62 |
+
|
63 |
+
```bash
|
64 |
+
pip install -r requirements.txt
|
65 |
+
```
|
66 |
+
|
67 |
+
## Preparation of other Pre-models
|
68 |
+
RVC requires other pre-models to infer and train.
|
69 |
+
|
70 |
+
You need to download them from our [Huggingface space](https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main/).
|
71 |
+
|
72 |
+
Here's a list of Pre-models and other files that RVC needs:
|
73 |
+
```bash
|
74 |
+
hubert_base.pt
|
75 |
+
|
76 |
+
./pretrained
|
77 |
+
|
78 |
+
./uvr5_weights
|
79 |
+
|
80 |
+
If you want to test the v2 version model (the v2 version model has changed the input from the 256 dimensional feature of 9-layer Hubert+final_proj to the 768 dimensional feature of 12-layer Hubert, and has added 3 period discriminators), you will need to download additional features
|
81 |
+
|
82 |
+
./pretrained_v2
|
83 |
+
|
84 |
+
#If you are using Windows, you may also need this dictionary, skip if FFmpeg is installed
|
85 |
+
ffmpeg.exe
|
86 |
+
```
|
87 |
+
Then use this command to start Webui:
|
88 |
+
```bash
|
89 |
+
python infer-web.py
|
90 |
+
```
|
91 |
+
If you are using Windows, you can download and extract `RVC-beta.7z` to use RVC directly and use `go-web.bat` to start Webui.
|
92 |
+
|
93 |
+
There's also a tutorial on RVC in Chinese and you can check it out if needed.
|
94 |
+
|
95 |
+
## Credits
|
96 |
+
+ [ContentVec](https://github.com/auspicious3000/contentvec/)
|
97 |
+
+ [VITS](https://github.com/jaywalnut310/vits)
|
98 |
+
+ [HIFIGAN](https://github.com/jik876/hifi-gan)
|
99 |
+
+ [Gradio](https://github.com/gradio-app/gradio)
|
100 |
+
+ [FFmpeg](https://github.com/FFmpeg/FFmpeg)
|
101 |
+
+ [Ultimate Vocal Remover](https://github.com/Anjok07/ultimatevocalremovergui)
|
102 |
+
+ [audio-slicer](https://github.com/openvpi/audio-slicer)
|
103 |
+
## Thanks to all contributors for their efforts
|
104 |
+
|
105 |
+
<a href="https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI/graphs/contributors" target="_blank">
|
106 |
+
<img src="https://contrib.rocks/image?repo=liujing04/Retrieval-based-Voice-Conversion-WebUI" />
|
107 |
+
</a>
|
108 |
+
|
docs/README.ja.md
ADDED
@@ -0,0 +1,104 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
<div align="center">
|
2 |
+
|
3 |
+
<h1>Retrieval-based-Voice-Conversion-WebUI</h1>
|
4 |
+
VITSใซๅบใฅใไฝฟใใใใ้ณๅฃฐๅคๆ๏ผvoice changer๏ผframework<br><br>
|
5 |
+
|
6 |
+
[](https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI)
|
7 |
+
|
8 |
+
<img src="https://counter.seku.su/cmoe?name=rvc&theme=r34" /><br>
|
9 |
+
|
10 |
+
[](https://colab.research.google.com/github/liujing04/Retrieval-based-Voice-Conversion-WebUI/blob/main/Retrieval_based_Voice_Conversion_WebUI.ipynb)
|
11 |
+
[](https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI/blob/main/%E4%BD%BF%E7%94%A8%E9%9C%80%E9%81%B5%E5%AE%88%E7%9A%84%E5%8D%8F%E8%AE%AE-LICENSE.txt)
|
12 |
+
[](https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main/)
|
13 |
+
|
14 |
+
[](https://discord.gg/HcsmBBGyVk)
|
15 |
+
|
16 |
+
</div>
|
17 |
+
|
18 |
+
------
|
19 |
+
|
20 |
+
[**ๆดๆฐๆฅ่ช**](https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI/blob/main/Changelog_CN.md)
|
21 |
+
|
22 |
+
[**English**](./README.en.md) | [**ไธญๆ็ฎไฝ**](../README.md) | [**ๆฅๆฌ่ช**](./README.ja.md) | [**ํ๊ตญ์ด**](./README.ko.md) ([**้ๅ่ช**](./README.ko.han.md))
|
23 |
+
|
24 |
+
> ใใขๅ็ปใฏ[ใใกใ](https://www.bilibili.com/video/BV1pm4y1z7Gm/)ใงใ่ฆงใใ ใใใ
|
25 |
+
|
26 |
+
> RVCใซใใใชใขใซใฟใคใ ้ณๅฃฐๅคๆ: [w-okada/voice-changer](https://github.com/w-okada/voice-changer)
|
27 |
+
|
28 |
+
> ่ไฝๆจฉไพตๅฎณใๅฟ้
ใใใใจใชใไฝฟ็จใงใใใใใซใๅบๅบใขใใซใฏ็ด50ๆ้ใฎ้ซๅ่ณชใชใชใผใใณใฝใผในใใผใฟใปใใใง่จ็ทดใใใฆใใพใใ
|
29 |
+
|
30 |
+
> ไปๅพใใๆฌกใ
ใจไฝฟ็จ่จฑๅฏใฎใใ้ซๅ่ณชใชๆญๅฃฐใฎ่ณๆ้ใ่ฟฝๅ ใใๅบๅบใขใใซใ่จ็ทดใใไบๅฎใงใใ
|
31 |
+
|
32 |
+
## ใฏใใใซ
|
33 |
+
ๆฌใชใใธใใชใซใฏไธ่จใฎ็นๅพดใใใใพใใ
|
34 |
+
|
35 |
+
+ Top1ๆค็ดขใ็จใใใใจใงใ็ใฎ็นๅพด้ใ่จ็ทด็จใใผใฟใปใใ็นๅพด้ใซๅคๆใใใใผใณใชใผใฑใผใธใๅๆธใใพใใ
|
36 |
+
+ ๆฏ่ผ็่ฒงๅผฑใชGPUใงใใ้ซ้ใใค็ฐกๅใซ่จ็ทดใงใใพใใ
|
37 |
+
+ ๅฐ้ใฎใใผใฟใปใใใใใงใใๆฏ่ผ็่ฏใ็ตๆใๅพใใใจใใงใใพใใ๏ผ10ๅไปฅไธใฎใใคใบใฎๅฐใชใ้ณๅฃฐใๆจๅฅจใใพใใ๏ผ
|
38 |
+
+ ใขใใซใ่ๅใใใใจใงใ้ณๅฃฐใๆททใใใใจใใงใใพใใ๏ผckpt processingใฟใใฎใckpt mergeใไฝฟ็จใใพใใ๏ผ
|
39 |
+
+ ไฝฟใใใใWebUIใ
|
40 |
+
+ UVR5 ModelใๅซใใงใใใใใไบบใฎๅฃฐใจBGMใ็ด ๆฉใๅ้ขใงใใพใใ
|
41 |
+
|
42 |
+
## ็ฐๅขๆง็ฏ
|
43 |
+
Poetryใงไพๅญ้ขไฟใใคใณในใใผใซใใใใจใใๅงใใใพใใ
|
44 |
+
|
45 |
+
ไธ่จใฎใณใใณใใฏใPython3.8ไปฅไธใฎ็ฐๅขใงๅฎ่กใใๅฟ
่ฆใใใใพใ:
|
46 |
+
```bash
|
47 |
+
# PyTorch้ข้ฃใฎไพๅญ้ขไฟใใคใณในใใผใซใใคใณในใใผใซๆธใฎๅ ดๅใฏ็็ฅใ
|
48 |
+
# ๅ็
งๅ
: https://pytorch.org/get-started/locally/
|
49 |
+
pip install torch torchvision torchaudio
|
50 |
+
|
51 |
+
#Windows๏ผ Nvidia Ampere Architecture(RTX30xx)ใฎๅ ดๅใ #21 ใซๅพใใpytorchใซๅฏพๅฟใใcuda versionใๆๅฎใใๅฟ
่ฆใใใใพใใ
|
52 |
+
#pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
|
53 |
+
|
54 |
+
# PyTorch้ข้ฃใฎไพๅญ้ขไฟใใคใณในใใผใซใใคใณในใใผใซๆธใฎๅ ดๅใฏ็็ฅใ
|
55 |
+
# ๅ็
งๅ
: https://python-poetry.org/docs/#installation
|
56 |
+
curl -sSL https://install.python-poetry.org | python3 -
|
57 |
+
|
58 |
+
# Poetry็ต็ฑใงไพๅญ้ขไฟใใคใณในใใผใซ
|
59 |
+
poetry install
|
60 |
+
```
|
61 |
+
|
62 |
+
pipใงใไพๅญ้ขไฟใฎใคใณในใใผใซใๅฏ่ฝใงใ:
|
63 |
+
|
64 |
+
```bash
|
65 |
+
pip install -r requirements.txt
|
66 |
+
```
|
67 |
+
|
68 |
+
## ๅบๅบmodelsใๆบๅ
|
69 |
+
RVCใฏๆจ่ซ/่จ็ทดใฎใใใซใๆงใ
ใชไบๅ่จ็ทดใ่กใฃใๅบๅบใขใใซใๅฟ
่ฆใจใใพใใ
|
70 |
+
|
71 |
+
modelsใฏ[Hugging Face space](https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main/)ใใใใฆใณใญใผใใงใใพใใ
|
72 |
+
|
73 |
+
ไปฅไธใฏใRVCใซๅฟ
่ฆใชๅบๅบใขใใซใใใฎไปใฎใใกใคใซใฎไธ่ฆงใงใใ
|
74 |
+
```bash
|
75 |
+
hubert_base.pt
|
76 |
+
|
77 |
+
./pretrained
|
78 |
+
|
79 |
+
./uvr5_weights
|
80 |
+
|
81 |
+
# ffmpegใใใงใซinstallใใใฆใใๅ ดๅใฏ็็ฅ
|
82 |
+
./ffmpeg
|
83 |
+
```
|
84 |
+
ใใฎๅพใไธ่จใฎใณใใณใใงWebUIใ่ตทๅใใพใใ
|
85 |
+
```bash
|
86 |
+
python infer-web.py
|
87 |
+
```
|
88 |
+
Windowsใใไฝฟใใฎๆนใฏใ็ดๆฅ`RVC-beta.7z`ใใใฆใณใญใผใๅพใซๅฑ้ใใ`go-web.bat`ใใฏใชใใฏใใใใจใงใWebUIใ่ตทๅใใใใจใใงใใพใใ(7zipใๅฟ
่ฆใงใใ)
|
89 |
+
|
90 |
+
ใพใใใชใใธใใชใซ[ๅฐ็ฝ็ฎๆๆ็จ.doc](./ๅฐ็ฝ็ฎๆๆ็จ.doc)ใใใใพใใฎใงใๅ่ใซใใฆใใ ใใ๏ผไธญๅฝ่ช็ใฎใฟ๏ผใ
|
91 |
+
|
92 |
+
## ๅ่ใใญใธใงใฏใ
|
93 |
+
+ [ContentVec](https://github.com/auspicious3000/contentvec/)
|
94 |
+
+ [VITS](https://github.com/jaywalnut310/vits)
|
95 |
+
+ [HIFIGAN](https://github.com/jik876/hifi-gan)
|
96 |
+
+ [Gradio](https://github.com/gradio-app/gradio)
|
97 |
+
+ [FFmpeg](https://github.com/FFmpeg/FFmpeg)
|
98 |
+
+ [Ultimate Vocal Remover](https://github.com/Anjok07/ultimatevocalremovergui)
|
99 |
+
+ [audio-slicer](https://github.com/openvpi/audio-slicer)
|
100 |
+
|
101 |
+
## ่ฒข็ฎ่
(contributor)ใฎ็ๆงใฎๅฐฝๅใซๆ่ฌใใพใ
|
102 |
+
<a href="https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI/graphs/contributors" target="_blank">
|
103 |
+
<img src="https://contrib.rocks/image?repo=liujing04/Retrieval-based-Voice-Conversion-WebUI" />
|
104 |
+
</a>
|
docs/README.ko.han.md
ADDED
@@ -0,0 +1,100 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
<div align="center">
|
2 |
+
|
3 |
+
<h1>Retrieval-based-Voice-Conversion-WebUI</h1>
|
4 |
+
VITSๅบ็ค์ ็ฐกๅฎํ๊ณ ไฝฟ็จํ๊ธฐ ์ฌ์ด้ณ่ฒ่ฎๆํ<br><br>
|
5 |
+
|
6 |
+
[](https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI)
|
7 |
+
|
8 |
+
<img src="https://counter.seku.su/cmoe?name=rvc&theme=r34" /><br>
|
9 |
+
|
10 |
+
[](https://colab.research.google.com/github/liujing04/Retrieval-based-Voice-Conversion-WebUI/blob/main/Retrieval_based_Voice_Conversion_WebUI.ipynb)
|
11 |
+
[](https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI/blob/main/%E4%BD%BF%E7%94%A8%E9%9C%80%E9%81%B5%E5%AE%88%E7%9A%84%E5%8D%8F%E8%AE%AE-LICENSE.txt)
|
12 |
+
[](https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main/)
|
13 |
+
|
14 |
+
[](https://discord.gg/HcsmBBGyVk)
|
15 |
+
|
16 |
+
</div>
|
17 |
+
|
18 |
+
------
|
19 |
+
[**ๆดๆฐๆฅ่ช**](https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI/blob/main/Changelog_CN.md)
|
20 |
+
|
21 |
+
[**English**](./README.en.md) | [**ไธญๆ็ฎไฝ**](../README.md) | [**ๆฅๆฌ่ช**](./README.ja.md) | [**ํ๊ตญ์ด**](./README.ko.md) ([**้ๅ่ช**](./README.ko.han.md))
|
22 |
+
|
23 |
+
> [็คบ็ฏๆ ๅ](https://www.bilibili.com/video/BV1pm4y1z7Gm/)์ ็ขบ่ชํด ๋ณด์ธ์!
|
24 |
+
|
25 |
+
> RVC๋ฅผๆดป็จํๅฏฆๆ้้ณ่ฒ่ฎๆ: [w-okada/voice-changer](https://github.com/w-okada/voice-changer)
|
26 |
+
|
27 |
+
> ๅบๆฌ๋ชจ๋ธ์ 50ๆ้ๅ้์ ้ซๅ่ณช ์คํ ์์ค VCTK ๋ฐ์ดํฐ์
์ ไฝฟ็จํ์์ผ๋ฏ๋ก, ่ไฝๆฌไธ์ ๅฟตๆ
ฎ๊ฐ ์์ผ๋ ๅฎๅฟํ๊ณ ไฝฟ็จํ์๊ธฐ ๋ฐ๋๋๋ค.
|
28 |
+
|
29 |
+
> ่ไฝๆฌๅ้ก๊ฐ ์๋ ้ซๅ่ณช์ ๋
ธ๋๋ฅผ ไปฅๅพ์๋ ็นผ็บํด์ ่จ็ทดํ ่ฑซๅฎ์
๋๋ค.
|
30 |
+
|
31 |
+
## ็ดนไป
|
32 |
+
ๆฌRepo๋ ๋ค์๊ณผ ๊ฐ์ ็นๅพต์ ๊ฐ์ง๊ณ ์์ต๋๋ค:
|
33 |
+
+ top1ๆชข็ดข์ๅฉ็จํ์ฌ ๅ
ฅๅ้ณ่ฒ็นๅพต์ ่จ็ทด์ธํธ้ณ่ฒ็นๅพต์ผ๋ก ไปฃๆฟํ์ฌ ้ณ่ฒ์ๆผๅบ์ ้ฒๆญข;
|
34 |
+
+ ็ธๅฐ็์ผ๋ก ๋ฎ์ๆง่ฝ์ GPU์์๋ ๋น ๋ฅธ่จ็ทดๅฏ่ฝ;
|
35 |
+
+ ์ ์้์ ๋ฐ์ดํฐ๋ก ่จ็ทดํด๋ ์ข์ ็ตๆ๋ฅผ ์ป์ ์ ์์ (ๆๅฐ10ๅไปฅไธ์ ไฝ้์้ณ่ฒ๋ฐ์ดํฐ๋ฅผ ไฝฟ็จํ๋ ๊ฒ์ ๅธ็);
|
36 |
+
+ ๋ชจ๋ธ่ๅ์้ํ ้ณ่ฒ์ ่ฎ่ชฟๅฏ่ฝ (ckpt่็ํญ->ckptๆททๅ้ธๆ);
|
37 |
+
+ ไฝฟ็จํ๊ธฐ ์ฌ์ด WebUI (์น ไฝฟ็จ่
์ธํฐํ์ด์ค);
|
38 |
+
+ UVR5 ๋ชจ๋ธ์ ๅฉ็จํ์ฌ ๋ชฉ์๋ฆฌ์ ่ๆฏ้ณๆจ์ ๋น ๋ฅธ ๅ้ข;
|
39 |
+
|
40 |
+
## ็ฐๅข์ๆบๅ
|
41 |
+
poetry๋ฅผ้ํด ไพๅญ๋ฅผ่จญ็ฝฎํ๋ ๊ฒ์ ๅธ็ํฉ๋๋ค.
|
42 |
+
|
43 |
+
๋ค์ๅฝไปค์ Python ๋ฒ์ 3.8ไปฅไธ์็ฐๅข์์ ๅฏฆ่ก๋์ด์ผ ํฉ๋๋ค:
|
44 |
+
```bash
|
45 |
+
# PyTorch ้่ฏไธป่ฆไพๅญ่จญ็ฝฎ, ์ด๋ฏธ่จญ็ฝฎ๋์ด ์๋ ๅข้ ๊ฑด๋๋ฐ๊ธฐ ๅฏ่ฝ
|
46 |
+
# ๅ็
ง: https://pytorch.org/get-started/locally/
|
47 |
+
pip install torch torchvision torchaudio
|
48 |
+
|
49 |
+
# Windows + Nvidia Ampere Architecture(RTX30xx)๋ฅผ ไฝฟ็จํ๊ณ ์๋ค้ข, #21 ์์ ๋ช
์๋ ๊ฒ๊ณผ ๊ฐ์ด PyTorch์ ๋ง๋ CUDA ๋ฒ์ ์ ๆๅฎํด์ผ ํฉ๋๋ค.
|
50 |
+
#pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
|
51 |
+
|
52 |
+
# Poetry ่จญ็ฝฎ, ์ด๋ฏธ่จญ็ฝฎ๋์ด ์๋ ๅข้ ๊ฑด๋๋ฐ๊ธฐ ๅฏ่ฝ
|
53 |
+
# Reference: https://python-poetry.org/docs/#installation
|
54 |
+
curl -sSL https://install.python-poetry.org | python3 -
|
55 |
+
|
56 |
+
# ไพๅญ่จญ็ฝฎ
|
57 |
+
poetry install
|
58 |
+
```
|
59 |
+
pip๋ฅผ ๆดป็จํ์ฌไพๅญ๋ฅผ ่จญ็ฝฎํ์ฌ๋ ็กๅฆจํฉ๋๋ค.
|
60 |
+
|
61 |
+
```bash
|
62 |
+
pip install -r requirements.txt
|
63 |
+
```
|
64 |
+
|
65 |
+
## ๅ
ถไป้ ๅ๋ชจ๋ธๆบๅ
|
66 |
+
RVC ๋ชจ๋ธ์ ๆจ่ซ๊ณผ่จ็ทด์ ไพํ์ฌ ๋ค๋ฅธ ้ ๅ๋ชจ๋ธ์ด ๅฟ
่ฆํฉ๋๋ค.
|
67 |
+
|
68 |
+
[Huggingface space](https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main/)๋ฅผ ้ํด์ ๋ค์ด๋ก๋ ํ ์ ์์ต๋๋ค.
|
69 |
+
|
70 |
+
๋ค์์ RVC์ ๅฟ
่ฆํ ้ ๅ๋ชจ๋ธ ๋ฐ ๅ
ถไป ํ์ผ ็ฎ้์
๋๋ค:
|
71 |
+
```bash
|
72 |
+
hubert_base.pt
|
73 |
+
|
74 |
+
./pretrained
|
75 |
+
|
76 |
+
./uvr5_weights
|
77 |
+
|
78 |
+
# Windows๋ฅผ ไฝฟ็จํ๋ๅข้ ์ด ์ฌ์ ๋ ๅฟ
่ฆํ ์ ์์ต๋๋ค. FFmpeg๊ฐ ่จญ็ฝฎ๋์ด ์์ผ๋ฉด ๊ฑด๋๋ฐ์ด๋ ๋ฉ๋๋ค.
|
79 |
+
ffmpeg.exe
|
80 |
+
```
|
81 |
+
๊ทธๅพ ไปฅไธ์ ๅฝไปค์ ไฝฟ็จํ์ฌ WebUI๋ฅผ ๅงไฝํ ์ ์์ต๋๋ค:
|
82 |
+
```bash
|
83 |
+
python infer-web.py
|
84 |
+
```
|
85 |
+
Windows๋ฅผ ไฝฟ็จํ๋ๅข้ `RVC-beta.7z`๋ฅผ ๋ค์ด๋ก๋ ๋ฐ ๅฃ็ธฎ่งฃ้คํ์ฌ RVC๋ฅผ ็ดๆฅไฝฟ็จํ๊ฑฐ๋ `go-web.bat`์ ไฝฟ็จํ์ฌ WebUi๋ฅผ ็ดๆฅํ ์ ์์ต๋๋ค.
|
86 |
+
|
87 |
+
## ๅ่
|
88 |
+
+ [ContentVec](https://github.com/auspicious3000/contentvec/)
|
89 |
+
+ [VITS](https://github.com/jaywalnut310/vits)
|
90 |
+
+ [HIFIGAN](https://github.com/jik876/hifi-gan)
|
91 |
+
+ [Gradio](https://github.com/gradio-app/gradio)
|
92 |
+
+ [FFmpeg](https://github.com/FFmpeg/FFmpeg)
|
93 |
+
+ [Ultimate Vocal Remover](https://github.com/Anjok07/ultimatevocalremovergui)
|
94 |
+
+ [audio-slicer](https://github.com/openvpi/audio-slicer)
|
95 |
+
## ๋ชจ๋ ๅฏ่่
๋ถ๋ค์ๅๅ์ๆ่ฌ๋๋ฆฝ๋๋ค
|
96 |
+
|
97 |
+
<a href="https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI/graphs/contributors" target="_blank">
|
98 |
+
<img src="https://contrib.rocks/image?repo=liujing04/Retrieval-based-Voice-Conversion-WebUI" />
|
99 |
+
</a>
|
100 |
+
|
docs/README.ko.md
ADDED
@@ -0,0 +1,112 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
<div align="center">
|
2 |
+
|
3 |
+
<h1>Retrieval-based-Voice-Conversion-WebUI</h1>
|
4 |
+
VITS ๊ธฐ๋ฐ์ ๊ฐ๋จํ๊ณ ์ฌ์ฉํ๊ธฐ ์ฌ์ด ์์ฑ ๋ณํ ํ๋ ์์ํฌ.<br><br>
|
5 |
+
|
6 |
+
[](https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI)
|
7 |
+
|
8 |
+
<img src="https://counter.seku.su/cmoe?name=rvc&theme=r34" /><br>
|
9 |
+
|
10 |
+
[](https://colab.research.google.com/github/liujing04/Retrieval-based-Voice-Conversion-WebUI/blob/main/Retrieval_based_Voice_Conversion_WebUI.ipynb)
|
11 |
+
[](https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI/blob/main/%E4%BD%BF%E7%94%A8%E9%9C%80%E9%81%B5%E5%AE%88%E7%9A%84%E5%8D%8F%E8%AE%AE-LICENSE.txt)
|
12 |
+
[](https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main/)
|
13 |
+
|
14 |
+
[](https://discord.gg/HcsmBBGyVk)
|
15 |
+
|
16 |
+
</div>
|
17 |
+
|
18 |
+
---
|
19 |
+
|
20 |
+
[**์
๋ฐ์ดํธ ๋ก๊ทธ**](https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI/blob/main/Changelog_KO.md)
|
21 |
+
|
22 |
+
[**English**](./README.en.md) | [**ไธญๆ็ฎไฝ**](../README.md) | [**ๆฅๆฌ่ช**](./README.ja.md) | [**ํ๊ตญ์ด**](./README.ko.md) ([**้ๅ่ช**](./README.ko.han.md))
|
23 |
+
|
24 |
+
> [๋ฐ๋ชจ ์์](https://www.bilibili.com/video/BV1pm4y1z7Gm/)์ ํ์ธํด ๋ณด์ธ์!
|
25 |
+
|
26 |
+
> RVC๋ฅผ ํ์ฉํ ์ค์๊ฐ ์์ฑ๋ณํ: [w-okada/voice-changer](https://github.com/w-okada/voice-changer)
|
27 |
+
|
28 |
+
> ๊ธฐ๋ณธ ๋ชจ๋ธ์ 50์๊ฐ ๊ฐ๋์ ๊ณ ํ๋ฆฌํฐ ์คํ ์์ค VCTK ๋ฐ์ดํฐ์
์ ์ฌ์ฉํ์์ผ๋ฏ๋ก, ์ ์๊ถ์์ ์ผ๋ ค๊ฐ ์์ผ๋ ์์ฌํ๊ณ ์ฌ์ฉํ์๊ธฐ ๋ฐ๋๋๋ค.
|
29 |
+
|
30 |
+
> ์ ์๊ถ ๋ฌธ์ ๊ฐ ์๋ ๊ณ ํ๋ฆฌํฐ์ ๋
ธ๋๋ฅผ ์ดํ์๋ ๊ณ์ํด์ ํ๋ จํ ์์ ์
๋๋ค.
|
31 |
+
|
32 |
+
## ์๊ฐ
|
33 |
+
|
34 |
+
๋ณธ Repo๋ ๋ค์๊ณผ ๊ฐ์ ํน์ง์ ๊ฐ์ง๊ณ ์์ต๋๋ค:
|
35 |
+
|
36 |
+
- top1 ๊ฒ์์ ์ด์ฉํ์ฌ ์
๋ ฅ ์์ ํน์ง์ ํ๋ จ ์ธํธ ์์ ํน์ง์ผ๋ก ๋์ฒดํ์ฌ ์์์ ๋์ถ์ ๋ฐฉ์ง;
|
37 |
+
- ์๋์ ์ผ๋ก ๋ฎ์ ์ฑ๋ฅ์ GPU์์๋ ๋น ๋ฅธ ํ๋ จ ๊ฐ๋ฅ;
|
38 |
+
- ์ ์ ์์ ๋ฐ์ดํฐ๋ก ํ๋ จํด๋ ์ข์ ๊ฒฐ๊ณผ๋ฅผ ์ป์ ์ ์์ (์ต์ 10๋ถ ์ด์์ ์ ์ก์ ์์ฑ ๋ฐ์ดํฐ๋ฅผ ์ฌ์ฉํ๋ ๊ฒ์ ๊ถ์ฅ);
|
39 |
+
- ๋ชจ๋ธ ์ตํฉ์ ํตํ ์์์ ๋ณ์กฐ ๊ฐ๋ฅ (ckpt ์ฒ๋ฆฌ ํญ->ckpt ๋ณํฉ ์ ํ);
|
40 |
+
- ์ฌ์ฉํ๊ธฐ ์ฌ์ด WebUI (์น ์ธํฐํ์ด์ค);
|
41 |
+
- UVR5 ๋ชจ๋ธ์ ์ด์ฉํ์ฌ ๋ชฉ์๋ฆฌ์ ๋ฐฐ๊ฒฝ์์
์ ๋น ๋ฅธ ๋ถ๋ฆฌ;
|
42 |
+
|
43 |
+
## ํ๊ฒฝ์ ์ค๋น
|
44 |
+
|
45 |
+
poetry๋ฅผ ํตํด dependecies๋ฅผ ์ค์นํ๋ ๊ฒ์ ๊ถ์ฅํฉ๋๋ค.
|
46 |
+
|
47 |
+
๋ค์ ๋ช
๋ น์ Python ๋ฒ์ 3.8 ์ด์์ ํ๊ฒฝ์์ ์คํ๋์ด์ผ ํฉ๋๋ค:
|
48 |
+
|
49 |
+
```bash
|
50 |
+
# PyTorch ๊ด๋ จ ์ฃผ์ dependencies ์ค์น, ์ด๋ฏธ ์ค์น๋์ด ์๋ ๊ฒฝ์ฐ ๊ฑด๋๋ฐ๊ธฐ ๊ฐ๋ฅ
|
51 |
+
# ์ฐธ์กฐ: https://pytorch.org/get-started/locally/
|
52 |
+
pip install torch torchvision torchaudio
|
53 |
+
|
54 |
+
# Windows + Nvidia Ampere Architecture(RTX30xx)๋ฅผ ์ฌ์ฉํ๊ณ ์๋ค๋ฉด, https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI/issues/21 ์์ ๋ช
์๋ ๊ฒ๊ณผ ๊ฐ์ด PyTorch์ ๋ง๋ CUDA ๋ฒ์ ์ ์ง์ ํด์ผ ํฉ๋๋ค.
|
55 |
+
#pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
|
56 |
+
|
57 |
+
# Poetry ์ค์น, ์ด๋ฏธ ์ค์น๋์ด ์๋ ๊ฒฝ์ฐ ๊ฑด๋๋ฐ๊ธฐ ๊ฐ๋ฅ
|
58 |
+
# Reference: https://python-poetry.org/docs/#installation
|
59 |
+
curl -sSL https://install.python-poetry.org | python3 -
|
60 |
+
|
61 |
+
# Dependecies ์ค์น
|
62 |
+
poetry install
|
63 |
+
```
|
64 |
+
|
65 |
+
pip๋ฅผ ํ์ฉํ์ฌ dependencies๋ฅผ ์ค์นํ์ฌ๋ ๋ฌด๋ฐฉํฉ๋๋ค.
|
66 |
+
|
67 |
+
```bash
|
68 |
+
pip install -r requirements.txt
|
69 |
+
```
|
70 |
+
|
71 |
+
## ๊ธฐํ ์ฌ์ ๋ชจ๋ธ ์ค๋น
|
72 |
+
|
73 |
+
RVC ๋ชจ๋ธ์ ์ถ๋ก ๊ณผ ํ๋ จ์ ์ํ์ฌ ๋ค๋ฅธ ์ฌ์ ๋ชจ๋ธ์ด ํ์ํฉ๋๋ค.
|
74 |
+
|
75 |
+
[Huggingface space](https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main/)๋ฅผ ํตํด์ ๋ค์ด๋ก๋ ํ ์ ์์ต๋๋ค.
|
76 |
+
|
77 |
+
๋ค์์ RVC์ ํ์ํ ์ฌ์ ๋ชจ๋ธ ๋ฐ ๊ธฐํ ํ์ผ ๋ชฉ๋ก์
๋๋ค:
|
78 |
+
|
79 |
+
```bash
|
80 |
+
hubert_base.pt
|
81 |
+
|
82 |
+
./pretrained
|
83 |
+
|
84 |
+
./uvr5_weights
|
85 |
+
|
86 |
+
# Windows๋ฅผ ์ฌ์ฉํ๋ ๊ฒฝ์ฐ ์ด ์ฌ์ ๋ ํ์ํ ์ ์์ต๋๋ค. FFmpeg๊ฐ ์ค์น๋์ด ์์ผ๋ฉด ๊ฑด๋๋ฐ์ด๋ ๋ฉ๋๋ค.
|
87 |
+
ffmpeg.exe
|
88 |
+
```
|
89 |
+
|
90 |
+
๊ทธ ํ ์ดํ์ ๋ช
๋ น์ ์ฌ์ฉํ์ฌ WebUI๋ฅผ ์์ํ ์ ์์ต๋๋ค:
|
91 |
+
|
92 |
+
```bash
|
93 |
+
python infer-web.py
|
94 |
+
```
|
95 |
+
|
96 |
+
Windows๋ฅผ ์ฌ์ฉํ๋ ๊ฒฝ์ฐ `RVC-beta.7z`๋ฅผ ๋ค์ด๋ก๋ ๋ฐ ์์ถ ํด์ ํ์ฌ RVC๋ฅผ ์ง์ ์ฌ์ฉํ๊ฑฐ๋ `go-web.bat`์ ์ฌ์ฉํ์ฌ WebUi๋ฅผ ์์ํ ์ ์์ต๋๋ค.
|
97 |
+
|
98 |
+
## ์ฐธ๊ณ
|
99 |
+
|
100 |
+
- [ContentVec](https://github.com/auspicious3000/contentvec/)
|
101 |
+
- [VITS](https://github.com/jaywalnut310/vits)
|
102 |
+
- [HIFIGAN](https://github.com/jik876/hifi-gan)
|
103 |
+
- [Gradio](https://github.com/gradio-app/gradio)
|
104 |
+
- [FFmpeg](https://github.com/FFmpeg/FFmpeg)
|
105 |
+
- [Ultimate Vocal Remover](https://github.com/Anjok07/ultimatevocalremovergui)
|
106 |
+
- [audio-slicer](https://github.com/openvpi/audio-slicer)
|
107 |
+
|
108 |
+
## ๋ชจ๋ ๊ธฐ์ฌ์ ๋ถ๋ค์ ๋
ธ๋ ฅ์ ๊ฐ์ฌ๋๋ฆฝ๋๋ค.
|
109 |
+
|
110 |
+
<a href="https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI/graphs/contributors" target="_blank">
|
111 |
+
<img src="https://contrib.rocks/image?repo=liujing04/Retrieval-based-Voice-Conversion-WebUI" />
|
112 |
+
</a>
|
docs/faiss_tips_en.md
ADDED
@@ -0,0 +1,102 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
faiss tuning TIPS
|
2 |
+
==================
|
3 |
+
# about faiss
|
4 |
+
faiss is a library of neighborhood searches for dense vectors, developed by facebook research, which efficiently implements many approximate neighborhood search methods.
|
5 |
+
Approximate Neighbor Search finds similar vectors quickly while sacrificing some accuracy.
|
6 |
+
|
7 |
+
## faiss in RVC
|
8 |
+
In RVC, for the embedding of features converted by HuBERT, we search for embeddings similar to the embedding generated from the training data and mix them to achieve a conversion that is closer to the original speech. However, since this search takes time if performed naively, high-speed conversion is realized by using approximate neighborhood search.
|
9 |
+
|
10 |
+
# implementation overview
|
11 |
+
In '/logs/your-experiment/3_feature256' where the model is located, features extracted by HuBERT from each voice data are located.
|
12 |
+
From here we read the npy files in order sorted by filename and concatenate the vectors to create big_npy. (This vector has shape [N, 256].)
|
13 |
+
After saving big_npy as /logs/your-experiment/total_fea.npy, train it with faiss.
|
14 |
+
|
15 |
+
In this article, I will explain the meaning of these parameters.
|
16 |
+
|
17 |
+
# Explanation of the method
|
18 |
+
## index factory
|
19 |
+
An index factory is a unique faiss notation that expresses a pipeline that connects multiple approximate neighborhood search methods as a string.
|
20 |
+
This allows you to try various approximate neighborhood search methods simply by changing the index factory string.
|
21 |
+
In RVC it is used like this:
|
22 |
+
|
23 |
+
```python
|
24 |
+
index = faiss.index_factory(256, "IVF%s,Flat" % n_ivf)
|
25 |
+
```
|
26 |
+
Among the arguments of index_factory, the first is the number of dimensions of the vector, the second is the index factory string, and the third is the distance to use.
|
27 |
+
|
28 |
+
For more detailed notation
|
29 |
+
https://github.com/facebookresearch/faiss/wiki/The-index-factory
|
30 |
+
|
31 |
+
## index for distance
|
32 |
+
There are two typical indexes used as similarity of embedding as follows.
|
33 |
+
|
34 |
+
- Euclidean distance (METRIC_L2)
|
35 |
+
- inner product (METRIC_INNER_PRODUCT)
|
36 |
+
|
37 |
+
Euclidean distance takes the squared difference in each dimension, sums the differences in all dimensions, and then takes the square root. This is the same as the distance in 2D and 3D that we use on a daily basis.
|
38 |
+
The inner product is not used as an index of similarity as it is, and the cosine similarity that takes the inner product after being normalized by the L2 norm is generally used.
|
39 |
+
|
40 |
+
Which is better depends on the case, but cosine similarity is often used in embedding obtained by word2vec and similar image retrieval models learned by ArcFace. If you want to do l2 normalization on vector X with numpy, you can do it with the following code with eps small enough to avoid 0 division.
|
41 |
+
|
42 |
+
```python
|
43 |
+
X_normed = X / np.maximum(eps, np.linalg.norm(X, ord=2, axis=-1, keepdims=True))
|
44 |
+
```
|
45 |
+
|
46 |
+
Also, for the index factory, you can change the distance index used for calculation by choosing the value to pass as the third argument.
|
47 |
+
|
48 |
+
```python
|
49 |
+
index = faiss.index_factory(dimention, text, faiss.METRIC_INNER_PRODUCT)
|
50 |
+
```
|
51 |
+
|
52 |
+
## IVF
|
53 |
+
IVF (Inverted file indexes) is an algorithm similar to the inverted index in full-text search.
|
54 |
+
During learning, the search target is clustered with kmeans, and Voronoi partitioning is performed using the cluster center. Each data point is assigned a cluster, so we create a dictionary that looks up the data points from the clusters.
|
55 |
+
|
56 |
+
For example, if clusters are assigned as follows
|
57 |
+
|index|Cluster|
|
58 |
+
|-----|-------|
|
59 |
+
|1|A|
|
60 |
+
|2|B|
|
61 |
+
|3|A|
|
62 |
+
|4|C|
|
63 |
+
|5|B|
|
64 |
+
|
65 |
+
The resulting inverted index looks like this:
|
66 |
+
|
67 |
+
|cluster|index|
|
68 |
+
|-------|-----|
|
69 |
+
|A|1, 3|
|
70 |
+
|B|2, 5|
|
71 |
+
|C|4|
|
72 |
+
|
73 |
+
When searching, we first search n_probe clusters from the clusters, and then calculate the distances for the data points belonging to each cluster.
|
74 |
+
|
75 |
+
# recommend parameter
|
76 |
+
There are official guidelines on how to choose an index, so I will explain accordingly.
|
77 |
+
https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index
|
78 |
+
|
79 |
+
For datasets below 1M, 4bit-PQ is the most efficient method available in faiss as of April 2023.
|
80 |
+
Combining this with IVF, narrowing down the candidates with 4bit-PQ, and finally recalculating the distance with an accurate index can be described by using the following index factory.
|
81 |
+
|
82 |
+
```python
|
83 |
+
index = faiss.index_factory(256, "IVF1024,PQ128x4fs,RFlat")
|
84 |
+
```
|
85 |
+
|
86 |
+
## Recommended parameters for IVF
|
87 |
+
Consider the case of too many IVFs. For example, if coarse quantization by IVF is performed for the number of data, this is the same as a naive exhaustive search and is inefficient.
|
88 |
+
For 1M or less, IVF values are recommended between 4*sqrt(N) ~ 16*sqrt(N) for N number of data points.
|
89 |
+
|
90 |
+
Since the calculation time increases in proportion to the number of n_probes, please consult with the accuracy and choose appropriately. Personally, I don't think RVC needs that much accuracy, so n_probe = 1 is fine.
|
91 |
+
|
92 |
+
## FastScan
|
93 |
+
FastScan is a method that enables high-speed approximation of distances by Cartesian product quantization by performing them in registers.
|
94 |
+
Cartesian product quantization performs clustering independently for each d dimension (usually d = 2) during learning, calculates the distance between clusters in advance, and creates a lookup table. At the time of prediction, the distance of each dimension can be calculated in O(1) by looking at the lookup table.
|
95 |
+
So the number you specify after PQ usually specifies half the dimension of the vector.
|
96 |
+
|
97 |
+
For a more detailed description of FastScan, please refer to the official documentation.
|
98 |
+
https://github.com/facebookresearch/faiss/wiki/Fast-accumulation-of-PQ-and-AQ-codes-(FastScan)
|
99 |
+
|
100 |
+
## RFlat
|
101 |
+
RFlat is an instruction to recalculate the rough distance calculated by FastScan with the exact distance specified by the third argument of index factory.
|
102 |
+
When getting k neighbors, k*k_factor points are recalculated.
|
docs/faiss_tips_ja.md
ADDED
@@ -0,0 +1,101 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
faiss tuning TIPS
|
2 |
+
==================
|
3 |
+
# about faiss
|
4 |
+
faissใฏfacebook researchใฎ้็บใใใๅฏใชใใฏใใซใซๅฏพใใ่ฟๅๆข็ดขใใพใจใใใฉใคใใฉใชใงใๅคใใฎ่ฟไผผ่ฟๅๆข็ดขใฎๆๆณใๅน็็ใซๅฎ่ฃ
ใใฆใใพใใ
|
5 |
+
่ฟไผผ่ฟๅๆข็ดขใฏใใ็จๅบฆ็ฒพๅบฆใ็ ็ฒใซใใชใใ้ซ้ใซ้กไผผใใใใฏใใซใๆขใใพใใ
|
6 |
+
|
7 |
+
## faiss in RVC
|
8 |
+
RVCใงใฏHuBERTใงๅคๆใใ็นๅพด้ใฎEmbeddingใซๅฏพใใๅญฆ็ฟใใผใฟใใ็ๆใใใEmbeddingใจ้กไผผใใใใฎใๆค็ดขใใๆททใใใใจใงใใๅ
ใฎ้ณๅฃฐใซ่ฟใๅคๆใๅฎ็พใใฆใใพใใใใ ใใใฎๆค็ดขใฏๆ็ดใซ่กใใจๆ้ใใใใใใใ่ฟไผผ่ฟๅๆข็ดขใ็จใใใใจใง้ซ้ใชๅคๆใๅฎ็พใใฆใใพใใ
|
9 |
+
|
10 |
+
# ๅฎ่ฃ
ใฎoverview
|
11 |
+
ใขใใซใ้
็ฝฎใใใฆใใ '/logs/your-experiment/3_feature256'ใซใฏๅ้ณๅฃฐใใผใฟใใHuBERTใงๆฝๅบใใใ็นๅพด้ใ้
็ฝฎใใใฆใใพใใ
|
12 |
+
ใใใใnpyใใกใคใซใใใกใคใซๅใงใฝใผใใใ้ ็ชใง่ชญใฟ่พผใฟใใใฏใใซใ้ฃ็ตใใฆbig_npyใไฝๆใfaissใๅญฆ็ฟใใใพใใ(ใใฎใใฏใใซใฎshapeใฏ[N, 256]ใงใใ)
|
13 |
+
|
14 |
+
ๆฌTipsใงใฏใพใใใใใฎใใฉใกใผใฟใฎๆๅณใ่งฃ่ชฌใใพใใ
|
15 |
+
|
16 |
+
# ๆๆณใฎ่งฃ่ชฌ
|
17 |
+
## index factory
|
18 |
+
index factoryใฏ่คๆฐใฎ่ฟไผผ่ฟๅๆข็ดขใฎๆๆณใ็นใใใใคใใฉใคใณใstringใง่กจ่จใใfaiss็ฌ่ชใฎ่จๆณใงใใ
|
19 |
+
ใใใซใใใindex factoryใฎๆๅญๅใๅคๆดใใใ ใใงๆงใ
ใช่ฟไผผ่ฟๅๆข็ดขใฎๆๆณใ่ฉฆใใพใใ
|
20 |
+
RVCใงใฏไปฅไธใฎใใใซไฝฟใใใฆใใพใใ
|
21 |
+
|
22 |
+
```python
|
23 |
+
index = faiss.index_factory(256, "IVF%s,Flat" % n_ivf)
|
24 |
+
```
|
25 |
+
index_factoryใฎๅผๆฐใฎใใกใ1ใค็ฎใฏใใฏใใซใฎๆฌกๅ
ๆฐใ2ใค็ฎใฏindex factoryใฎๆๅญๅใงใ3ใค็ฎใซใฏ็จใใ่ท้ขใๆๅฎใใใใจใใงใใพใใ
|
26 |
+
|
27 |
+
ใใ่ฉณ็ดฐใช่จๆณใซใคใใฆใฏ
|
28 |
+
https://github.com/facebookresearch/faiss/wiki/The-index-factory
|
29 |
+
|
30 |
+
## ่ท้ขๆๆจ
|
31 |
+
embeddingใฎ้กไผผๅบฆใจใใฆ็จใใใใไปฃ่กจ็ใชๆๆจใจใใฆไปฅไธใฎไบใคใใใใพใใ
|
32 |
+
|
33 |
+
- ใฆใผใฏใชใใ่ท้ข(METRIC_L2)
|
34 |
+
- ๅ
็ฉ(METRIC_INNER_PRODUCT)
|
35 |
+
|
36 |
+
ใฆใผใฏใชใใ่ท้ขใงใฏๅๆฌกๅ
ใซใใใฆไบไนใฎๅทฎใใจใใๅ
จๆฌกๅ
ใฎๅทฎใ่ถณใใฆใใๅนณๆนๆ นใใจใใพใใใใใฏๆฅๅธธ็ใซ็จใใ2ๆฌกๅ
ใ3ๆฌกๅ
ใงใฎ่ท้ขใจๅใใงใใ
|
37 |
+
ๅ
็ฉใฏใใฎใพใพใงใฏ้กไผผๅบฆใฎๆๆจใจใใฆ็จใใใไธ่ฌ็ใซใฏL2ใใซใ ใงๆญฃ่ฆๅใใฆใใๅ
็ฉใใจใใณใตใคใณ้กไผผๅบฆใ็จใใพใใ
|
38 |
+
|
39 |
+
ใฉใกใใใใใใฏๅ ดๅใซใใใพใใใword2vec็ญใงๅพใใใembeddingใArcFace็ญใงๅญฆ็ฟใใ้กไผผ็ปๅๆค็ดขใฎใขใใซใงใฏใณใตใคใณ้กไผผๅบฆใ็จใใใใใใจใๅคใใงใใใใฏใใซXใซๅฏพใใฆl2ๆญฃ่ฆๅใnumpyใง่กใๅ ดๅใฏใ0 divisionใ้ฟใใใใใซๅๅใซๅฐใใชๅคใepsใจใใฆไปฅไธใฎใณใผใใงๅฏ่ฝใงใใ
|
40 |
+
|
41 |
+
```python
|
42 |
+
X_normed = X / np.maximum(eps, np.linalg.norm(X, ord=2, axis=-1, keepdims=True))
|
43 |
+
```
|
44 |
+
|
45 |
+
ใพใใindex factoryใซใฏ็ฌฌ3ๅผๆฐใซๆธกใๅคใ้ธใถใใจใง่จ็ฎใซ็จใใ่ท้ขๆๆจใๅคๆดใงใใพใใ
|
46 |
+
|
47 |
+
```python
|
48 |
+
index = faiss.index_factory(dimention, text, faiss.METRIC_INNER_PRODUCT)
|
49 |
+
```
|
50 |
+
|
51 |
+
## IVF
|
52 |
+
IVF(Inverted file indexes)ใฏๅ
จๆๆค็ดขใซใใใ่ปข็ฝฎใคใณใใใฏในใจไผผใใใใชใขใซใดใชใบใ ใงใใ
|
53 |
+
ๅญฆ็ฟๆใซใฏๆค็ดขๅฏพ่ฑกใซๅฏพใใฆkmeansใงใฏใฉในใฟใชใณใฐใ่กใใใฏใฉในใฟไธญๅฟใ็จใใฆใใญใใคๅๅฒใ่กใใพใใๅใใผใฟ็นใซใฏไธใคใใคใฏใฉในใฟใๅฒใๅฝใฆใใใใฎใงใใฏใฉในใฟใใใใผใฟ็นใ้ๅผใใใ่พๆธใไฝๆใใพใใ
|
54 |
+
|
55 |
+
ไพใใฐไปฅไธใฎใใใซใฏใฉในใฟใๅฒใๅฝใฆใใใๅ ดๅ
|
56 |
+
|index|ใฏใฉในใฟ|
|
57 |
+
|-----|-------|
|
58 |
+
|1|A|
|
59 |
+
|2|B|
|
60 |
+
|3|A|
|
61 |
+
|4|C|
|
62 |
+
|5|B|
|
63 |
+
|
64 |
+
ไฝๆใใใ่ปข็ฝฎใคใณใใใฏในใฏไปฅไธใฎใใใซใชใใพใใ
|
65 |
+
|
66 |
+
|ใฏใฉในใฟ|index|
|
67 |
+
|-------|-----|
|
68 |
+
|A|1, 3|
|
69 |
+
|B|2, 5|
|
70 |
+
|C|4|
|
71 |
+
|
72 |
+
ๆค็ดขๆใซใฏใพใใฏใฉในใฟใใn_probeๅใฎใฏใฉในใฟใๆค็ดขใใๆฌกใซใใใใใฎใฏใฉในใฟใซๅฑใใใใผใฟ็นใซใคใใฆ่ท้ขใ่จ็ฎใใพใใ
|
73 |
+
|
74 |
+
# ๆจๅฅจใใใใใฉใกใผใฟ
|
75 |
+
indexใฎ้ธใณๆนใซใคใใฆใฏๅ
ฌๅผใซใฌใคใใฉใคใณใใใใฎใงใใใใซๆบใใฆ่ชฌๆใใพใใ
|
76 |
+
https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index
|
77 |
+
|
78 |
+
1Mไปฅไธใฎใใผใฟใปใใใซใใใฆใฏ4bit-PQใ2023ๅนด4ๆๆ็นใงใฏfaissใงๅฉ็จใงใใๆใๅน็็ใชๆๆณใงใใ
|
79 |
+
ใใใIVFใจ็ตใฟๅใใใ4bit-PQใงๅ่ฃใ็ตใใๆๅพใซๆญฃ็ขบใชๆๆจใง่ท้ขใๅ่จ็ฎใใใซใฏไปฅไธใฎindex factoryใ็จใใใใจใง่จ่ผใงใใพใใ
|
80 |
+
|
81 |
+
```python
|
82 |
+
index = faiss.index_factory(256, "IVF1024,PQ128x4fs,RFlat")
|
83 |
+
```
|
84 |
+
|
85 |
+
## IVFใฎๆจๅฅจใใฉใกใผใฟ
|
86 |
+
IVFใฎๆฐใๅคใใใๅ ดๅใใใจใใฐใใผใฟๆฐใฎๆฐใ ใIVFใซใใ็ฒ้ๅญๅใ่กใใจใใใใฏๆ็ดใชๅ
จๆข็ดขใจๅใใซใชใๅน็ใๆชใใงใใ
|
87 |
+
1Mไปฅไธใฎๅ ดๅใงใฏIVFใฎๅคใฏใใผใฟ็นใฎๆฐNใซๅฏพใใฆ4*sqrt(N) ~ 16*sqrt(N)ใซๆจๅฅจใใฆใใพใใ
|
88 |
+
|
89 |
+
n_probeใฏn_probeใฎๆฐใซๆฏไพใใฆ่จ็ฎๆ้ใๅขใใใฎใงใ็ฒพๅบฆใจ็ธ่ซใใฆ้ฉๅใซ้ธใใงใใ ใใใๅไบบ็ใซใฏRVCใซใใใฆใใใพใง็ฒพๅบฆใฏๅฟ
่ฆใชใใจๆใใฎใงn_probe = 1ใง่ฏใใจๆใใพใใ
|
90 |
+
|
91 |
+
## FastScan
|
92 |
+
FastScanใฏ็ด็ฉ้ๅญๅใงๅคงใพใใซ่ท้ขใ่ฟไผผใใใฎใใใฌใธในใฟๅ
ใง่กใใใจใซใใ้ซ้ใซ่กใใใใซใใๆๆณใงใใ
|
93 |
+
็ด็ฉ้ๅญๅใฏๅญฆ็ฟๆใซdๆฌกๅ
ใใจ(้ๅธธใฏd=2)ใซ็ฌ็ซใใฆใฏใฉในใฟใชใณใฐใ่กใใใฏใฉในใฟๅๅฃซใฎ่ท้ขใไบๅ่จ็ฎใใฆlookup tableใไฝๆใใพใใไบๆธฌๆใฏlookup tableใ่ฆใใใจใงๅๆฌกๅ
ใฎ่ท้ขใO(1)ใง่จ็ฎใงใใพใใ
|
94 |
+
ใใฎใใใPQใฎๆฌกใซๆๅฎใใๆฐๅญใฏ้ๅธธใใฏใใซใฎๅๅใฎๆฌกๅ
ใๆๅฎใใพใใ
|
95 |
+
|
96 |
+
FastScanใซ้ขใใใใ่ฉณ็ดฐใช่ชฌๆใฏๅ
ฌๅผใฎใใญใฅใกใณใใๅ็
งใใฆใใ ใใใ
|
97 |
+
https://github.com/facebookresearch/faiss/wiki/Fast-accumulation-of-PQ-and-AQ-codes-(FastScan)
|
98 |
+
|
99 |
+
## RFlat
|
100 |
+
RFlatใฏFastScanใง่จ็ฎใใๅคงใพใใช่ท้ขใใindex factoryใฎ็ฌฌไธๅผๆฐใงๆๅฎใใๆญฃ็ขบใช่ท้ขใงๅ่จ็ฎใใๆ็คบใงใใ
|
101 |
+
kๅใฎ่ฟๅใๅๅพใใ้ใฏใk*k_factorๅใฎ็นใซใคใใฆๅ่จ็ฎใ่กใใใพใใ
|
docs/faiss_tips_ko.md
ADDED
@@ -0,0 +1,132 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Facebook AI Similarity Search (Faiss) ํ
|
2 |
+
==================
|
3 |
+
# Faiss์ ๋ํ์ฌ
|
4 |
+
Faiss ๋ Facebook Research๊ฐ ๊ฐ๋ฐํ๋, ๊ณ ๋ฐ๋ ๋ฒกํฐ ์ด์ ๊ฒ์ ๋ผ์ด๋ธ๋ฌ๋ฆฌ์
๋๋ค. ๊ทผ์ฌ ๊ทผ์ ํ์๋ฒ (Approximate Neigbor Search)์ ์ฝ๊ฐ์ ์ ํ์ฑ์ ํฌ์ํ์ฌ ์ ์ฌ ๋ฒกํฐ๋ฅผ ๊ณ ์์ผ๋ก ์ฐพ์ต๋๋ค.
|
5 |
+
|
6 |
+
## RVC์ ์์ด์ Faiss
|
7 |
+
RVC์์๋ HuBERT๋ก ๋ณํํ feature์ embedding์ ์ํด ํ๋ จ ๋ฐ์ดํฐ์์ ์์ฑ๋ embedding๊ณผ ์ ์ฌํ embadding์ ๊ฒ์ํ๊ณ ํผํฉํ์ฌ ์๋์ ์์ฑ์ ๋์ฑ ๊ฐ๊น์ด ๋ณํ์ ๋ฌ์ฑํฉ๋๋ค. ๊ทธ๋ฌ๋, ์ด ํ์๋ฒ์ ๋จ์ํ ์ํํ๋ฉด ์๊ฐ์ด ๋ค์ ์๋ชจ๋๋ฏ๋ก, ๊ทผ์ฌ ๊ทผ์ ํ์๋ฒ์ ํตํด ๊ณ ์ ๋ณํ์ ๊ฐ๋ฅ์ผ ํ๊ณ ์์ต๋๋ค.
|
8 |
+
|
9 |
+
# ๊ตฌํ ๊ฐ์
|
10 |
+
๋ชจ๋ธ์ด ์์นํ `/logs/your-experiment/3_feature256`์๋ ๊ฐ ์์ฑ ๋ฐ์ดํฐ์์ HuBERT๊ฐ ์ถ์ถํ feature๋ค์ด ์์ต๋๋ค. ์ฌ๊ธฐ์์ ํ์ผ ์ด๋ฆ๋ณ๋ก ์ ๋ ฌ๋ npy ํ์ผ์ ์ฝ๊ณ , ๋ฒกํฐ๋ฅผ ์ฐ๊ฒฐํ์ฌ big_npy ([N, 256] ๋ชจ์์ ๋ฒกํฐ) ๋ฅผ ๋ง๋ญ๋๋ค. big_npy๋ฅผ `/logs/your-experiment/total_fea.npy`๋ก ์ ์ฅํ ํ, Faiss๋ก ํ์ต์ํต๋๋ค.
|
11 |
+
|
12 |
+
2023/04/18 ๊ธฐ์ค์ผ๋ก, Faiss์ Index Factory ๊ธฐ๋ฅ์ ์ด์ฉํด, L2 ๊ฑฐ๋ฆฌ์ ๊ทผ๊ฑฐํ๋ IVF๋ฅผ ์ด์ฉํ๊ณ ์์ต๋๋ค. IVF์ ๋ถํ ์(n_ivf)๋ N//39๋ก, n_probe๋ int(np.power(n_ivf, 0.3))๊ฐ ์ฌ์ฉ๋๊ณ ์์ต๋๋ค. (infer-web.py์ train_index ์ฃผ์๋ฅผ ์ฐพ์ผ์ญ์์ค.)
|
13 |
+
|
14 |
+
์ด ํ์์๋ ๋จผ์ ์ด๋ฌํ ๋งค๊ฐ ๋ณ์์ ์๋ฏธ๋ฅผ ์ค๋ช
ํ๊ณ , ๊ฐ๋ฐ์๊ฐ ์ถํ ๋ ๋์ index๋ฅผ ์์ฑํ ์ ์๋๋ก ํ๋ ์กฐ์ธ์ ์์ฑํฉ๋๋ค.
|
15 |
+
|
16 |
+
# ๋ฐฉ๋ฒ์ ์ค๋ช
|
17 |
+
## Index factory
|
18 |
+
index factory๋ ์ฌ๋ฌ ๊ทผ์ฌ ๊ทผ์ ํ์๋ฒ์ ๋ฌธ์์ด๋ก ์ฐ๊ฒฐํ๋ pipeline์ ๋ฌธ์์ด๋ก ํ๊ธฐํ๋ Faiss๋ง์ ๋
์์ ์ธ ๊ธฐ๋ฒ์
๋๋ค. ์ด๋ฅผ ํตํด index factory์ ๋ฌธ์์ด์ ๋ณ๊ฒฝํ๋ ๊ฒ๋ง์ผ๋ก ๋ค์ํ ๊ทผ์ฌ ๊ทผ์ ํ์์ ์๋ํด ๋ณผ ์ ์์ต๋๋ค. RVC์์๋ ๋ค์๊ณผ ๊ฐ์ด ์ฌ์ฉ๋ฉ๋๋ค:
|
19 |
+
|
20 |
+
```python
|
21 |
+
index = Faiss.index_factory(256, "IVF%s,Flat" % n_ivf)
|
22 |
+
```
|
23 |
+
`index_factory`์ ์ธ์๋ค ์ค ์ฒซ ๋ฒ์งธ๋ ๋ฒกํฐ์ ์ฐจ์ ์์ด๊ณ , ๋๋ฒ์งธ๋ index factory ๋ฌธ์์ด์ด๋ฉฐ, ์ธ๋ฒ์งธ์๋ ์ฌ์ฉํ ๊ฑฐ๋ฆฌ๋ฅผ ์ง์ ํ ์ ์์ต๋๋ค.
|
24 |
+
|
25 |
+
๊ธฐ๋ฒ์ ๋ณด๋ค ์์ธํ ์ค๋ช
์ https://github.com/facebookresearch/Faiss/wiki/The-index-factory ๋ฅผ ํ์ธํด ์ฃผ์ญ์์ค.
|
26 |
+
|
27 |
+
## ๊ฑฐ๋ฆฌ์ ๋ํ index
|
28 |
+
embedding์ ์ ์ฌ๋๋ก์ ์ฌ์ฉ๋๋ ๋ํ์ ์ธ ์งํ๋ก์ ์ดํ์ 2๊ฐ๊ฐ ์์ต๋๋ค.
|
29 |
+
|
30 |
+
- ์ ํด๋ฆฌ๋ ๊ฑฐ๋ฆฌ (METRIC_L2)
|
31 |
+
- ๋ด์ (ๅ
็ฉ) (METRIC_INNER_PRODUCT)
|
32 |
+
|
33 |
+
์ ํด๋ฆฌ๋ ๊ฑฐ๋ฆฌ์์๋ ๊ฐ ์ฐจ์์์ ์ ๊ณฑ์ ์ฐจ๋ฅผ ๊ตฌํ๊ณ , ๊ฐ ์ฐจ์์์ ๊ตฌํ ์ฐจ๋ฅผ ๋ชจ๋ ๋ํ ํ ์ ๊ณฑ๊ทผ์ ์ทจํฉ๋๋ค. ์ด๊ฒ์ ์ผ์์ ์ผ๋ก ์ฌ์ฉ๋๋ 2์ฐจ์, 3์ฐจ์์์์ ๊ฑฐ๋ฆฌ์ ์ฐ์ฐ๋ฒ๊ณผ ๊ฐ์ต๋๋ค. ๋ด์ ์ ๊ทธ ๊ฐ์ ๊ทธ๋๋ก ์ ์ฌ๋ ์งํ๋ก ์ฌ์ฉํ์ง ์๊ณ , L2 ์ ๊ทํ๋ฅผ ํ ์ดํ ๋ด์ ์ ์ทจํ๋ ์ฝ์ฌ์ธ ์ ์ฌ๋๋ฅผ ์ฌ์ฉํฉ๋๋ค.
|
34 |
+
|
35 |
+
์ด๋ ์ชฝ์ด ๋ ์ข์์ง๋ ๊ฒฝ์ฐ์ ๋ฐ๋ผ ๋ค๋ฅด์ง๋ง, word2vec์์ ์ป์ embedding ๋ฐ ArcFace๋ฅผ ํ์ฉํ ์ด๋ฏธ์ง ๊ฒ์ ๋ชจ๋ธ์ ์ฝ์ฌ์ธ ์ ์ฌ์ฑ์ด ์ด์ฉ๋๋ ๊ฒฝ์ฐ๊ฐ ๋ง์ต๋๋ค. numpy๋ฅผ ์ฌ์ฉํ์ฌ ๋ฒกํฐ X์ ๋ํด L2 ์ ๊ทํ๋ฅผ ํ๊ณ ์ ํ๋ ๊ฒฝ์ฐ, 0 division์ ํผํ๊ธฐ ์ํด ์ถฉ๋ถํ ์์ ๊ฐ์ eps๋ก ํ ๋ค ์ดํ์ ์ฝ๋๋ฅผ ํ์ฉํ๋ฉด ๋ฉ๋๋ค.
|
36 |
+
|
37 |
+
```python
|
38 |
+
X_normed = X / np.maximum(eps, np.linalg.norm(X, ord=2, axis=-1, keepdims=True))
|
39 |
+
```
|
40 |
+
|
41 |
+
๋ํ, `index factory`์ 3๋ฒ์งธ ์ธ์์ ๊ฑด๋ค์ฃผ๋ ๊ฐ์ ์ ํํ๋ ๊ฒ์ ํตํด ๊ณ์ฐ์ ์ฌ์ฉํ๋ ๊ฑฐ๋ฆฌ index๋ฅผ ๋ณ๊ฒฝํ ์ ์์ต๋๋ค.
|
42 |
+
|
43 |
+
```python
|
44 |
+
index = Faiss.index_factory(dimention, text, Faiss.METRIC_INNER_PRODUCT)
|
45 |
+
```
|
46 |
+
|
47 |
+
## IVF
|
48 |
+
IVF (Inverted file indexes)๋ ์ญ์์ธ ํ์๋ฒ๊ณผ ์ ์ฌํ ์๊ณ ๋ฆฌ์ฆ์
๋๋ค. ํ์ต์์๋ ๊ฒ์ ๋์์ ๋ํด k-ํ๊ท ๊ตฐ์ง๋ฒ์ ์ค์ํ๊ณ ํด๋ฌ์คํฐ ์ค์ฌ์ ์ด์ฉํด ๋ณด๋ก๋
ธ์ด ๋ถํ ์ ์ค์ํฉ๋๋ค. ๊ฐ ๋ฐ์ดํฐ ํฌ์ธํธ์๋ ํด๋ฌ์คํฐ๊ฐ ํ ๋น๋๋ฏ๋ก, ํด๋ฌ์คํฐ์์ ๋ฐ์ดํฐ ํฌ์ธํธ๋ฅผ ์กฐํํ๋ dictionary๋ฅผ ๋ง๋ญ๋๋ค.
|
49 |
+
|
50 |
+
์๋ฅผ ๋ค์ด, ํด๋ฌ์คํฐ๊ฐ ๋ค์๊ณผ ๊ฐ์ด ํ ๋น๋ ๊ฒฝ์ฐ
|
51 |
+
|index|Cluster|
|
52 |
+
|-----|-------|
|
53 |
+
|1|A|
|
54 |
+
|2|B|
|
55 |
+
|3|A|
|
56 |
+
|4|C|
|
57 |
+
|5|B|
|
58 |
+
|
59 |
+
IVF ์ดํ์ ๊ฒฐ๊ณผ๋ ๋ค์๊ณผ ๊ฐ์ต๋๋ค:
|
60 |
+
|
61 |
+
|cluster|index|
|
62 |
+
|-------|-----|
|
63 |
+
|A|1, 3|
|
64 |
+
|B|2, 5|
|
65 |
+
|C|4|
|
66 |
+
|
67 |
+
ํ์ ์, ์ฐ์ ํด๋ฌ์คํฐ์์ `n_probe`๊ฐ์ ํด๋ฌ์คํฐ๋ฅผ ํ์ํ ๋ค์, ๊ฐ ํด๋ฌ์คํฐ์ ์ํ ๋ฐ์ดํฐ ํฌ์ธํธ์ ๊ฑฐ๋ฆฌ๋ฅผ ๊ณ์ฐํฉ๋๋ค.
|
68 |
+
|
69 |
+
# ๊ถ์ฅ ๋งค๊ฐ๋ณ์
|
70 |
+
index์ ์ ํ ๋ฐฉ๋ฒ์ ๋ํด์๋ ๊ณต์์ ์ผ๋ก ๊ฐ์ด๋ ๋ผ์ธ์ด ์์ผ๋ฏ๋ก, ๊ฑฐ๊ธฐ์ ์คํด ์ค๋ช
ํฉ๋๋ค.
|
71 |
+
https://github.com/facebookresearch/Faiss/wiki/Guidelines-to-choose-an-index
|
72 |
+
|
73 |
+
1M ์ดํ์ ๋ฐ์ดํฐ ์ธํธ์ ์์ด์๋ 4bit-PQ๊ฐ 2023๋
4์ ์์ ์์๋ Faiss๋ก ์ด์ฉํ ์ ์๋ ๊ฐ์ฅ ํจ์จ์ ์ธ ์๋ฒ์
๋๋ค. ์ด๊ฒ์ IVF์ ์กฐํฉํด, 4bit-PQ๋ก ํ๋ณด๋ฅผ ์ถ๋ ค๋ด๊ณ , ๋ง์ง๋ง์ผ๋ก ์ดํ์ index factory๋ฅผ ์ด์ฉํ์ฌ ์ ํํ ์งํ๋ก ๊ฑฐ๋ฆฌ๏ฟฝ๏ฟฝ๏ฟฝ ์ฌ๊ณ์ฐํ๋ฉด ๋ฉ๋๋ค.
|
74 |
+
|
75 |
+
```python
|
76 |
+
index = Faiss.index_factory(256, "IVF1024,PQ128x4fs,RFlat")
|
77 |
+
```
|
78 |
+
|
79 |
+
## IVF ๊ถ์ฅ ๋งค๊ฐ๋ณ์
|
80 |
+
IVF์ ์๊ฐ ๋๋ฌด ๋ง์ผ๋ฉด, ๊ฐ๋ น ๋ฐ์ดํฐ ์์ ์๋งํผ IVF๋ก ์์ํ(Quantization)๋ฅผ ์ํํ๋ฉด, ์ด๊ฒ์ ์์ ํ์๊ณผ ๊ฐ์์ ธ ํจ์จ์ด ๋๋น ์ง๊ฒ ๋ฉ๋๋ค. 1M ์ดํ์ ๊ฒฝ์ฐ IVF ๊ฐ์ ๋ฐ์ดํฐ ํฌ์ธํธ ์ N์ ๋ํด 4sqrt(N) ~ 16sqrt(N)๋ฅผ ์ฌ์ฉํ๋ ๊ฒ์ ๊ถ์ฅํฉ๋๋ค.
|
81 |
+
|
82 |
+
n_probe๋ n_probe์ ์์ ๋น๋กํ์ฌ ๊ณ์ฐ ์๊ฐ์ด ๋์ด๋๋ฏ๋ก ์ ํ๋์ ์๊ฐ์ ์ ์ ํ ๊ท ํ์ ๋ง์ถ์ด ์ฃผ์ญ์์ค. ๊ฐ์ธ์ ์ผ๋ก RVC์ ์์ด์ ๊ทธ๋ ๊ฒ๊น์ง ์ ํ๋๋ ํ์ ์๋ค๊ณ ์๊ฐํ๊ธฐ ๋๋ฌธ์ n_probe = 1์ด๋ฉด ๋๋ค๊ณ ์๊ฐํฉ๋๋ค.
|
83 |
+
|
84 |
+
## FastScan
|
85 |
+
FastScan์ ์ง์ ์์ํ๋ฅผ ๋ ์ง์คํฐ์์ ์ํํจ์ผ๋ก์จ ๊ฑฐ๋ฆฌ์ ๊ณ ์ ๊ทผ์ฌ๋ฅผ ๊ฐ๋ฅํ๊ฒ ํ๋ ๋ฐฉ๋ฒ์
๋๋ค.์ง์ ์์ํ๋ ํ์ต์์ d์ฐจ์๋ง๋ค(๋ณดํต d=2)์ ๋
๋ฆฝ์ ์ผ๋ก ํด๋ฌ์คํฐ๋ง์ ์ค์ํด, ํด๋ฌ์คํฐ๋ผ๋ฆฌ์ ๊ฑฐ๋ฆฌ๋ฅผ ์ฌ์ ๊ณ์ฐํด lookup table๋ฅผ ์์ฑํฉ๋๋ค. ์์ธก์๋ lookup table์ ๋ณด๋ฉด ๊ฐ ์ฐจ์์ ๊ฑฐ๋ฆฌ๋ฅผ O(1)๋ก ๊ณ์ฐํ ์ ์์ต๋๋ค. ๋ฐ๋ผ์ PQ ๋ค์์ ์ง์ ํ๋ ์ซ์๋ ์ผ๋ฐ์ ์ผ๋ก ๋ฒกํฐ์ ์ ๋ฐ ์ฐจ์์ ์ง์ ํฉ๋๋ค.
|
86 |
+
|
87 |
+
FastScan์ ๋ํ ์์ธํ ์ค๋ช
์ ๊ณต์ ๋ฌธ์๋ฅผ ์ฐธ์กฐํ์ญ์์ค.
|
88 |
+
https://github.com/facebookresearch/Faiss/wiki/Fast-accumulation-of-PQ-and-AQ-codes-(FastScan)
|
89 |
+
|
90 |
+
## RFlat
|
91 |
+
RFlat์ FastScan์ด ๊ณ์ฐํ ๋๋ต์ ์ธ ๊ฑฐ๋ฆฌ๋ฅผ index factory์ 3๋ฒ์งธ ์ธ์๋ก ์ง์ ํ ์ ํํ ๊ฑฐ๋ฆฌ๋ก ๋ค์ ๊ณ์ฐํ๋ผ๋ ์ธ์คํธ๋ญ์
์
๋๋ค. k๊ฐ์ ๊ทผ์ ๋ณ์๋ฅผ ๊ฐ์ ธ์ฌ ๋ k*k_factor๊ฐ์ ์ ์ ๋ํด ์ฌ๊ณ์ฐ์ด ์ด๋ฃจ์ด์ง๋๋ค.
|
92 |
+
|
93 |
+
# Embedding ํ
ํฌ๋
|
94 |
+
## Alpha ์ฟผ๋ฆฌ ํ์ฅ
|
95 |
+
ํด๋ฆฌ ํ์ฅ์ด๋ ํ์์์ ์ฌ์ฉ๋๋ ๊ธฐ์ ๋ก, ์๋ฅผ ๋ค์ด ์ ๋ฌธ ํ์ ์, ์
๋ ฅ๋ ๊ฒ์๋ฌธ์ ๋จ์ด๋ฅผ ๋ช ๊ฐ๋ฅผ ์ถ๊ฐํจ์ผ๋ก์จ ๊ฒ์ ์ ํ๋๋ฅผ ์ฌ๋ฆฌ๋ ๋ฐฉ๋ฒ์
๋๋ค. ๋ฐฑํฐ ํ์์ ์ํด์๋ ๋ช๊ฐ์ง ๋ฐฉ๋ฒ์ด ์ ์๋์๋๋ฐ, ๊ทธ ์ค ฮฑ-์ฟผ๋ฆฌ ํ์ฅ์ ์ถ๊ฐ ํ์ต์ด ํ์ ์๋ ๋งค์ฐ ํจ๊ณผ์ ์ธ ๋ฐฉ๋ฒ์ผ๋ก ์๋ ค์ ธ ์์ต๋๋ค. [Attention-Based Query Expansion Learning](https://arxiv.org/abs/2007.08019)์ [2nd place solution of kaggle shopee competition](https://www.kaggle.com/code/lyakaap/2nd-place-solution/notebook) ๋
ผ๋ฌธ์์ ์๊ฐ๋ ๋ฐ ์์ต๋๋ค..
|
96 |
+
|
97 |
+
ฮฑ-์ฟผ๋ฆฌ ํ์ฅ์ ํ ๋ฒกํฐ์ ์ธ์ ํ ๋ฒกํฐ๋ฅผ ์ ์ฌ๋์ ฮฑ๊ณฑํ ๊ฐ์ค์น๋ก ๋ํด์ฃผ๋ฉด ๋ฉ๋๋ค. ์ฝ๋๋ก ์์๋ฅผ ๋ค์ด ๋ณด๊ฒ ์ต๋๋ค. big_npy๋ฅผ ฮฑ query expansion๋ก ๋์ฒดํฉ๋๋ค.
|
98 |
+
|
99 |
+
```python
|
100 |
+
alpha = 3.
|
101 |
+
index = Faiss.index_factory(256, "IVF512,PQ128x4fs,RFlat")
|
102 |
+
original_norm = np.maximum(np.linalg.norm(big_npy, ord=2, axis=1, keepdims=True), 1e-9)
|
103 |
+
big_npy /= original_norm
|
104 |
+
index.train(big_npy)
|
105 |
+
index.add(big_npy)
|
106 |
+
dist, neighbor = index.search(big_npy, num_expand)
|
107 |
+
|
108 |
+
expand_arrays = []
|
109 |
+
ixs = np.arange(big_npy.shape[0])
|
110 |
+
for i in range(-(-big_npy.shape[0]//batch_size)):
|
111 |
+
ix = ixs[i*batch_size:(i+1)*batch_size]
|
112 |
+
weight = np.power(np.einsum("nd,nmd->nm", big_npy[ix], big_npy[neighbor[ix]]), alpha)
|
113 |
+
expand_arrays.append(np.sum(big_npy[neighbor[ix]] * np.expand_dims(weight, axis=2),axis=1))
|
114 |
+
big_npy = np.concatenate(expand_arrays, axis=0)
|
115 |
+
|
116 |
+
# index version ์ ๊ทํ
|
117 |
+
big_npy = big_npy / np.maximum(np.linalg.norm(big_npy, ord=2, axis=1, keepdims=True), 1e-9)
|
118 |
+
```
|
119 |
+
|
120 |
+
์ ํ
ํฌ๋์ ํ์์ ์ํํ๋ ์ฟผ๋ฆฌ์๋, ํ์ ๋์ DB์๋ ์ ์ ๊ฐ๋ฅํ ํ
ํฌ๋์
๋๋ค.
|
121 |
+
|
122 |
+
## MiniBatch KMeans์ ์ํ embedding ์์ถ
|
123 |
+
|
124 |
+
total_fea.npy๊ฐ ๋๋ฌด ํด ๊ฒฝ์ฐ K-means๋ฅผ ์ด์ฉํ์ฌ ๋ฒกํฐ๋ฅผ ์๊ฒ ๋ง๋๋ ๊ฒ์ด ๊ฐ๋ฅํฉ๋๋ค. ์ดํ ์ฝ๋๋ก embedding์ ์์ถ์ด ๊ฐ๋ฅํฉ๋๋ค. n_clusters์ ์์ถํ๊ณ ์ ํ๋ ํฌ๊ธฐ๋ฅผ ์ง์ ํ๊ณ batch_size์ 256 * CPU์ ์ฝ์ด ์๋ฅผ ์ง์ ํจ์ผ๋ก์จ CPU ๋ณ๋ ฌํ์ ํํ์ ์ถฉ๋ถํ ์ป์ ์ ์์ต๋๋ค.
|
125 |
+
|
126 |
+
```python
|
127 |
+
import multiprocessing
|
128 |
+
from sklearn.cluster import MiniBatchKMeans
|
129 |
+
kmeans = MiniBatchKMeans(n_clusters=10000, batch_size=256 * multiprocessing.cpu_count(), init="random")
|
130 |
+
kmeans.fit(big_npy)
|
131 |
+
sample_npy = kmeans.cluster_centers_
|
132 |
+
```
|
docs/faq.md
ADDED
@@ -0,0 +1,89 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
## Q1:ffmpeg error/utf8 error.
|
2 |
+
|
3 |
+
ๅคงๆฆ็ไธๆฏffmpeg้ฎ้ข๏ผ่ๆฏ้ณ้ข่ทฏๅพ้ฎ้ข๏ผ<br>
|
4 |
+
ffmpeg่ฏปๅ่ทฏๅพๅธฆ็ฉบๆ ผใ()็ญ็นๆฎ็ฌฆๅท๏ผๅฏ่ฝๅบ็ฐffmpeg error๏ผ่ฎญ็ป้้ณ้ขๅธฆไธญๆ่ทฏๅพ๏ผๅจๅๅ
ฅfilelist.txt็ๆถๅๅฏ่ฝๅบ็ฐutf8 error๏ผ<br>
|
5 |
+
|
6 |
+
## Q2:ไธ้ฎ่ฎญ็ป็ปๆๆฒกๆ็ดขๅผ
|
7 |
+
|
8 |
+
ๆพ็คบ"Training is done. The program is closed."ๅๆจกๅ่ฎญ็ปๆๅ๏ผๅ็ปญ็ดง้ป็ๆฅ้ๆฏๅ็๏ผ<br>
|
9 |
+
|
10 |
+
ไธ้ฎ่ฎญ็ป็ปๆๅฎๆๆฒกๆaddedๅผๅคด็็ดขๅผๆไปถ๏ผๅฏ่ฝๆฏๅ ไธบ่ฎญ็ป้ๅคชๅคงๅกไฝไบๆทปๅ ็ดขๅผ็ๆญฅ้ชค๏ผๅทฒ้่ฟๆนๅค็add็ดขๅผ่งฃๅณๅ
ๅญadd็ดขๅผๅฏนๅ
ๅญ้ๆฑ่ฟๅคง็้ฎ้ขใไธดๆถๅฏๅฐ่ฏๅๆฌก็นๅป"่ฎญ็ป็ดขๅผ"ๆ้ฎใ<br>
|
11 |
+
|
12 |
+
## Q3:่ฎญ็ป็ปๆๆจ็ๆฒก็ๅฐ่ฎญ็ป้็้ณ่ฒ
|
13 |
+
็นๅทๆฐ้ณ่ฒๅ็็๏ผๅฆๆ่ฟๆฒกๆ็็่ฎญ็ปๆๆฒกๆๆฅ้๏ผๆงๅถๅฐๅwebui็ๆชๅพ๏ผlogs/ๅฎ้ชๅไธ็log๏ผ้ฝๅฏไปฅๅ็ปๅผๅ่
็็ใ<br>
|
14 |
+
|
15 |
+
## Q4:ๅฆไฝๅไบซๆจกๅ
|
16 |
+
โโrvc_root/logs/ๅฎ้ชๅ ไธ้ขๅญๅจ็pthไธๆฏ็จๆฅๅไบซๆจกๅ็จๆฅๆจ็็๏ผ่ๆฏไธบไบๅญๅจๅฎ้ช็ถๆไพๅค็ฐ๏ผไปฅๅ็ปง็ปญ่ฎญ็ป็จ็ใ็จๆฅๅไบซ็ๆจกๅๅบ่ฏฅๆฏweightsๆไปถๅคนไธๅคงๅฐไธบ60+MB็pthๆไปถ๏ผ<br>
|
17 |
+
โโๅ็ปญๅฐๆweights/exp_name.pthๅlogs/exp_name/added_xxx.indexๅๅนถๆๅ
ๆweights/exp_name.zip็ๅปๅกซๅindex็ๆญฅ้ชค๏ผ้ฃไนzipๆไปถ็จๆฅๅไบซ๏ผไธ่ฆๅไบซpthๆไปถ๏ผ้ค้ๆฏๆณๆขๆบๅจ็ปง็ปญ่ฎญ็ป๏ผ<br>
|
18 |
+
โโๅฆๆไฝ ๆlogsๆไปถๅคนไธ็ๅ ็พMB็pthๆไปถๅคๅถ/ๅไบซๅฐweightsๆไปถๅคนไธๅผบ่ก็จไบๆจ็๏ผๅฏ่ฝไผๅบ็ฐf0๏ผtgt_sr็ญๅ็งkeyไธๅญๅจ็ๆฅ้ใไฝ ้่ฆ็จckpt้้กนๅกๆไธ้ข๏ผๆๅทฅๆ่ชๅจ๏ผๆฌๅฐlogsไธๅฆๆ่ฝๆพๅฐ็ธๅ
ณไฟกๆฏๅไผ่ชๅจ๏ผ้ๆฉๆฏๅฆๆบๅธฆ้ณ้ซใ็ฎๆ ้ณ้ข้ๆ ท็็้้กนๅ่ฟ่กckptๅฐๆจกๅๆๅ๏ผ่พๅ
ฅ่ทฏๅพๅกซGๅผๅคด็้ฃไธช๏ผ๏ผๆๅๅฎๅจweightsๆไปถๅคนไธไผๅบ็ฐ60+MB็pthๆไปถ๏ผๅทๆฐ้ณ่ฒๅๅฏไปฅ้ๆฉไฝฟ็จใ<br>
|
19 |
+
|
20 |
+
## Q5:Connection Error.
|
21 |
+
ไน่ฎธไฝ ๅ
ณ้ญไบๆงๅถๅฐ๏ผ้ป่ฒ็ชๅฃ๏ผใ<br>
|
22 |
+
|
23 |
+
## Q6:WebUIๅผนๅบExpecting value: line 1 column 1 (char 0).
|
24 |
+
่ฏทๅ
ณ้ญ็ณป็ปๅฑๅ็ฝไปฃ็/ๅ
จๅฑไปฃ็ใ<br>
|
25 |
+
|
26 |
+
่ฟไธชไธไป
ๆฏๅฎขๆท็ซฏ็ไปฃ็๏ผไนๅ
ๆฌๆๅก็ซฏ็ไปฃ็๏ผไพๅฆไฝ ไฝฟ็จautodl่ฎพ็ฝฎไบhttp_proxyๅhttps_proxyๅญฆๆฏๅ ้๏ผไฝฟ็จๆถไน้่ฆunsetๅ
ณๆ๏ผ<br>
|
27 |
+
|
28 |
+
## Q7:ไธ็จWebUIๅฆไฝ้่ฟๅฝไปค่ฎญ็ปๆจ็
|
29 |
+
่ฎญ็ป่ๆฌ๏ผ<br>
|
30 |
+
ๅฏๅ
่ท้WebUI๏ผๆถๆฏ็ชๅ
ไผๆพ็คบๆฐๆฎ้ๅค็ๅ่ฎญ็ป็จๅฝไปค่ก๏ผ<br>
|
31 |
+
|
32 |
+
ๆจ็่ๆฌ๏ผ<br>
|
33 |
+
https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/myinfer.py<br>
|
34 |
+
|
35 |
+
ไพๅญ๏ผ<br>
|
36 |
+
|
37 |
+
runtime\python.exe myinfer.py 0 "E:\codes\py39\RVC-beta\todo-songs\1111.wav" "E:\codes\py39\logs\mi-test\added_IVF677_Flat_nprobe_7.index" harvest "test.wav" "weights/mi-test.pth" 0.6 cuda:0 True<br>
|
38 |
+
|
39 |
+
f0up_key=sys.argv[1]<br>
|
40 |
+
input_path=sys.argv[2]<br>
|
41 |
+
index_path=sys.argv[3]<br>
|
42 |
+
f0method=sys.argv[4]#harvest or pm<br>
|
43 |
+
opt_path=sys.argv[5]<br>
|
44 |
+
model_path=sys.argv[6]<br>
|
45 |
+
index_rate=float(sys.argv[7])<br>
|
46 |
+
device=sys.argv[8]<br>
|
47 |
+
is_half=bool(sys.argv[9])<br>
|
48 |
+
|
49 |
+
## Q8:Cuda error/Cuda out of memory.
|
50 |
+
ๅฐๆฆ็ๆฏcuda้
็ฝฎ้ฎ้ขใ่ฎพๅคไธๆฏๆ๏ผๅคงๆฆ็ๆฏๆพๅญไธๅค๏ผout of memory๏ผ๏ผ<br>
|
51 |
+
|
52 |
+
่ฎญ็ป็่ฏ็ผฉๅฐbatch size๏ผๅฆๆ็ผฉๅฐๅฐ1่ฟไธๅคๅช่ฝๆดๆขๆพๅก่ฎญ็ป๏ผ๏ผๆจ็็่ฏ้
ๆ
็ผฉๅฐconfig.py็ปๅฐพ็x_pad๏ผx_query๏ผx_center๏ผx_maxใ4Gไปฅไธๆพๅญ๏ผไพๅฆ1060๏ผ3G๏ผๅๅ็ง2Gๆพๅก๏ผๅฏไปฅ็ดๆฅๆพๅผ๏ผ4Gๆพๅญๆพๅก่ฟๆๆใ<br>
|
53 |
+
|
54 |
+
## Q9:total_epoch่ฐๅคๅฐๆฏ่พๅฅฝ
|
55 |
+
|
56 |
+
ๅฆๆ่ฎญ็ป้้ณ่ดจๅทฎๅบๅชๅคง๏ผ20~30่ถณๅคไบ๏ผ่ฐๅคช้ซ๏ผๅบๆจก้ณ่ดจๆ ๆณๅธฆ้ซไฝ ็ไฝ้ณ่ดจ่ฎญ็ป้<br>
|
57 |
+
ๅฆๆ่ฎญ็ป้้ณ่ดจ้ซๅบๅชไฝๆถ้ฟๅค๏ผๅฏไปฅ่ฐ้ซ๏ผ200ๆฏok็๏ผ่ฎญ็ป้ๅบฆๅพๅฟซ๏ผๆข็ถไฝ ๆๆกไปถๅๅค้ซ้ณ่ดจ่ฎญ็ป้๏ผๆพๅกๆณๅฟ
ๆกไปถไนไธ้๏ผ่ฏๅฎไธๅจไนๅคไธไบ่ฎญ็ปๆถ้ด๏ผ<br>
|
58 |
+
|
59 |
+
## Q10:้่ฆๅคๅฐ่ฎญ็ป้ๆถ้ฟ
|
60 |
+
โโๆจ่10min่ณ50min<br>
|
61 |
+
โโไฟ่ฏ้ณ่ดจ้ซๅบๅชไฝ็ๆ
ๅตไธ๏ผๅฆๆๆไธชไบบ็น่ฒ็้ณ่ฒ็ปไธ๏ผๅๅคๅค็ๅ<br>
|
62 |
+
โโ้ซๆฐดๅนณ็่ฎญ็ป้๏ผ็ฒพ็ฎ+้ณ่ฒๆ็น่ฒ๏ผ๏ผ5min่ณ10minไนๆฏok็๏ผไปๅบไฝ่
ๆฌไบบๅฐฑ็ปๅธธ่ฟไน็ฉ<br>
|
63 |
+
โโไนๆไบบๆฟ1min่ณ2min็ๆฐๆฎๆฅ่ฎญ็ปๅนถไธ่ฎญ็ปๆๅ็๏ผไฝๆฏๆๅ็ป้ชๆฏๅ
ถไปไบบไธๅฏๅค็ฐ็๏ผไธๅคชๅ
ทๅคๅ่ไปทๅผใ่ฟ่ฆๆฑ่ฎญ็ป้้ณ่ฒ็น่ฒ้ๅธธๆๆพ๏ผๆฏๅฆ่ฏด้ซ้ขๆฐๅฃฐ่พๆๆพ็่่ๅฐๅฅณ้ณ๏ผ๏ผไธ้ณ่ดจ้ซ๏ผ<br>
|
64 |
+
โโ1minไปฅไธๆถ้ฟๆฐๆฎ็ฎๅๆฒก่งๆไบบๅฐ่ฏ๏ผๆๅ๏ผ่ฟใไธๅปบ่ฎฎ่ฟ่ก่ฟ็ง้ฌผ็่กไธบใ<br>
|
65 |
+
|
66 |
+
## Q11:index rateๅนฒๅ็จ็๏ผๆไน่ฐ๏ผ็งๆฎ๏ผ
|
67 |
+
โโๅฆๆๅบๆจกๅๆจ็ๆบ็้ณ่ดจ้ซไบ่ฎญ็ป้็้ณ่ดจ๏ผไปไปฌๅฏไปฅๅธฆ้ซๆจ็็ปๆ็้ณ่ดจ๏ผไฝไปฃไปทๅฏ่ฝๆฏ้ณ่ฒๅพๅบๆจก/ๆจ็ๆบ็้ณ่ฒ้ ๏ผ่ฟ็ง็ฐ่ฑกๅซๅ"้ณ่ฒๆณ้ฒ"๏ผ<br>
|
68 |
+
โโindex rate็จๆฅๅๅ/่งฃๅณ้ณ่ฒๆณ้ฒ้ฎ้ขใ่ฐๅฐ1๏ผๅ็่ฎบไธไธๅญๅจๆจ็ๆบ็้ณ่ฒๆณ้ฒ้ฎ้ข๏ผไฝ้ณ่ดจๆดๅพๅไบ่ฎญ็ป้ใๅฆๆ่ฎญ็ป้้ณ่ดจๆฏๆจ็ๆบไฝ๏ผๅindex rate่ฐ้ซๅฏ่ฝ้ไฝ้ณ่ดจใ่ฐๅฐ0๏ผๅไธๅ
ทๅคๅฉ็จๆฃ็ดขๆททๅๆฅไฟๆค่ฎญ็ป้้ณ่ฒ็ๆๆ๏ผ<br>
|
69 |
+
โโๅฆๆ่ฎญ็ป้ไผ่ดจๆถ้ฟๅค๏ผๅฏ่ฐ้ซtotal_epoch๏ผๆญคๆถๆจกๅๆฌ่บซไธๅคชไผๅผ็จๆจ็ๆบๅๅบๆจก็้ณ่ฒ๏ผๅพๅฐๅญๅจ"้ณ่ฒๆณ้ฒ"้ฎ้ข๏ผๆญคๆถindex_rateไธ้่ฆ๏ผไฝ ็่ณๅฏไปฅไธๅปบ็ซ/ๅไบซindex็ดขๅผๆไปถใ<br>
|
70 |
+
|
71 |
+
## Q11:ๆจ็ๆไน้gpu
|
72 |
+
config.pyๆไปถ้device cuda:ๅ้ข้ๆฉๅกๅท๏ผ<br>
|
73 |
+
ๅกๅทๅๆพๅก็ๆ ๅฐๅ
ณ็ณป๏ผๅจ่ฎญ็ป้้กนๅก็ๆพๅกไฟกๆฏๆ ้่ฝ็ๅฐใ<br>
|
74 |
+
|
75 |
+
## Q12:ๅฆไฝๆจ็่ฎญ็ปไธญ้ดไฟๅญ็pth
|
76 |
+
้่ฟckpt้้กนๅกๆไธ้ขๆๅๅฐๆจกๅใ<br>
|
77 |
+
|
78 |
+
|
79 |
+
## Q13:ๅฆไฝไธญๆญๅ็ปง็ปญ่ฎญ็ป
|
80 |
+
็ฐ้ถๆฎตๅช่ฝๅ
ณ้ญWebUIๆงๅถๅฐๅๅปgo-web.bat้ๅฏ็จๅบใ็ฝ้กตๅๆฐไน่ฆๅทๆฐ้ๆฐๅกซๅ๏ผ<br>
|
81 |
+
็ปง็ปญ่ฎญ็ป๏ผ็ธๅ็ฝ้กตๅๆฐ็น่ฎญ็ปๆจกๅ๏ผๅฐฑไผๆฅ็ไธๆฌก็checkpoint็ปง็ปญ่ฎญ็ปใ<br>
|
82 |
+
|
83 |
+
## Q14:่ฎญ็ปๆถๅบ็ฐๆไปถ้กต้ข/ๅ
ๅญerror
|
84 |
+
่ฟ็จๅผๅคชๅคไบ๏ผๅ
ๅญ็ธไบใไฝ ๅฏ่ฝๅฏไปฅ้่ฟๅฆไธๆนๅผ่งฃๅณ<br>
|
85 |
+
1ใ"ๆๅ้ณ้ซๅๅค็ๆฐๆฎไฝฟ็จ็CPU่ฟ็จๆฐ" ้
ๆ
ๆไฝ๏ผ<br>
|
86 |
+
2ใ่ฎญ็ป้้ณ้ขๆๅทฅๅไธไธ๏ผไธ่ฆๅคช้ฟใ<br>
|
87 |
+
|
88 |
+
|
89 |
+
|
docs/faq_en.md
ADDED
@@ -0,0 +1,95 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
## Q1:ffmpeg error/utf8 error.
|
2 |
+
It is most likely not a FFmpeg issue, but rather an audio path issue;
|
3 |
+
|
4 |
+
FFmpeg may encounter an error when reading paths containing special characters like spaces and (), which may cause an FFmpeg error; and when the training set's audio contains Chinese paths, writing it into filelist.txt may cause a utf8 error.<br>
|
5 |
+
|
6 |
+
## Q2:Cannot find index file after "One-click Training".
|
7 |
+
If it displays "Training is done. The program is closed," then the model has been trained successfully, and the subsequent errors are fake;
|
8 |
+
|
9 |
+
The lack of an 'added' index file after One-click training may be due to the training set being too large, causing the addition of the index to get stuck; this has been resolved by using batch processing to add the index, which solves the problem of memory overload when adding the index. As a temporary solution, try clicking the "Train Index" button again.<br>
|
10 |
+
|
11 |
+
## Q3:Cannot find the model in โInferencing timbreโ after training
|
12 |
+
Click โRefresh timbre listโ and check again; if still not visible, check if there are any errors during training and send screenshots of the console, web UI, and logs/experiment_name/*.log to the developers for further analysis.<br>
|
13 |
+
|
14 |
+
## Q4:How to share a model/How to use others' models?
|
15 |
+
The pth files stored in rvc_root/logs/experiment_name are not meant for sharing or inference, but for storing the experiment checkpoits for reproducibility and further training. The model to be shared should be the 60+MB pth file in the weights folder;
|
16 |
+
|
17 |
+
In the future, weights/exp_name.pth and logs/exp_name/added_xxx.index will be merged into a single weights/exp_name.zip file to eliminate the need for manual index input; so share the zip file, not the pth file, unless you want to continue training on a different machine;
|
18 |
+
|
19 |
+
Copying/sharing the several hundred MB pth files from the logs folder to the weights folder for forced inference may result in errors such as missing f0, tgt_sr, or other keys. You need to use the ckpt tab at the bottom to manually or automatically (if the information is found in the logs/exp_name), select whether to include pitch infomation and target audio sampling rate options and then extract the smaller model. After extraction, there will be a 60+ MB pth file in the weights folder, and you can refresh the voices to use it.<br>
|
20 |
+
|
21 |
+
## Q5:Connection Error.
|
22 |
+
You may have closed the console (black command line window).<br>
|
23 |
+
|
24 |
+
## Q6:WebUI popup 'Expecting value: line 1 column 1 (char 0)'.
|
25 |
+
Please disable system LAN proxy/global proxy and then refresh.<br>
|
26 |
+
|
27 |
+
## Q7:How to train and infer without the WebUI?
|
28 |
+
Training script:<br>
|
29 |
+
You can run training in WebUI first, and the command-line versions of dataset preprocessing and training will be displayed in the message window.<br>
|
30 |
+
|
31 |
+
Inference script:<br>
|
32 |
+
https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/myinfer.py<br>
|
33 |
+
|
34 |
+
|
35 |
+
e.g.<br>
|
36 |
+
|
37 |
+
runtime\python.exe myinfer.py 0 "E:\codes\py39\RVC-beta\todo-songs\1111.wav" "E:\codes\py39\logs\mi-test\added_IVF677_Flat_nprobe_7.index" harvest "test.wav" "weights/mi-test.pth" 0.6 cuda:0 True<br>
|
38 |
+
|
39 |
+
|
40 |
+
f0up_key=sys.argv[1]<br>
|
41 |
+
input_path=sys.argv[2]<br>
|
42 |
+
index_path=sys.argv[3]<br>
|
43 |
+
f0method=sys.argv[4]#harvest or pm<br>
|
44 |
+
opt_path=sys.argv[5]<br>
|
45 |
+
model_path=sys.argv[6]<br>
|
46 |
+
index_rate=float(sys.argv[7])<br>
|
47 |
+
device=sys.argv[8]<br>
|
48 |
+
is_half=bool(sys.argv[9])<br>
|
49 |
+
|
50 |
+
## Q8:Cuda error/Cuda out of memory.
|
51 |
+
There is a small chance that there is a problem with the CUDA configuration or the device is not supported; more likely, there is not enough memory (out of memory).<br>
|
52 |
+
|
53 |
+
For training, reduce the batch size (if reducing to 1 is still not enough, you may need to change the graphics card); for inference, adjust the x_pad, x_query, x_center, and x_max settings in the config.py file as needed. 4G or lower memory cards (e.g. 1060(3G) and various 2G cards) can be abandoned, while 4G memory cards still have a chance.<br>
|
54 |
+
|
55 |
+
## Q9:How many total_epoch are optimal?
|
56 |
+
If the training dataset's audio quality is poor and the noise floor is high, 20-30 epochs are sufficient. Setting it too high won't improve the audio quality of your low-quality training set.<br>
|
57 |
+
|
58 |
+
If the training set audio quality is high, the noise floor is low, and there is sufficient duration, you can increase it. 200 is acceptable (since training is fast, and if you're able to prepare a high-quality training set, your GPU likely can handle a longer training duration without issue).<br>
|
59 |
+
|
60 |
+
## Q10:How much training set duration is needed?
|
61 |
+
|
62 |
+
A dataset of around 10min to 50min is recommended.<br>
|
63 |
+
|
64 |
+
With guaranteed high sound quality and low bottom noise, more can be added if the dataset's timbre is uniform.<br>
|
65 |
+
|
66 |
+
For a high-level training set (lean + distinctive tone), 5min to 10min is fine.<br>
|
67 |
+
|
68 |
+
There are some people who have trained successfully with 1min to 2min data, but the success is not reproducible by others and is not very informative. <br>This requires that the training set has a very distinctive timbre (e.g. a high-frequency airy anime girl sound) and the quality of the audio is high;
|
69 |
+
Data of less than 1min duration has not been successfully attempted so far. This is not recommended.<br>
|
70 |
+
|
71 |
+
|
72 |
+
## Q11:What is the index rate for and how to adjust it?
|
73 |
+
If the tone quality of the pre-trained model and inference source is higher than that of the training set, they can bring up the tone quality of the inference result, but at the cost of a possible tone bias towards the tone of the underlying model/inference source rather than the tone of the training set, which is generally referred to as "tone leakage".<br>
|
74 |
+
|
75 |
+
The index rate is used to reduce/resolve the timbre leakage problem. If the index rate is set to 1, theoretically there is no timbre leakage from the inference source and the timbre quality is more biased towards the training set. If the training set has a lower sound quality than the inference source, then a higher index rate may reduce the sound quality. Turning it down to 0 does not have the effect of using retrieval blending to protect the training set tones.<br>
|
76 |
+
|
77 |
+
If the training set has good audio quality and long duration, turn up the total_epoch, when the model itself is less likely to refer to the inferred source and the pretrained underlying model, and there is little "tone leakage", the index_rate is not important and you can even not create/share the index file.<br>
|
78 |
+
|
79 |
+
## Q12:How to choose the gpu when inferring?
|
80 |
+
In the config.py file, select the card number after "device cuda:".<br>
|
81 |
+
|
82 |
+
The mapping between card number and graphics card can be seen in the graphics card information section of the training tab.<br>
|
83 |
+
|
84 |
+
## Q13:How to use the model saved in the middle of training?
|
85 |
+
Save via model extraction at the bottom of the ckpt processing tab.
|
86 |
+
|
87 |
+
## Q14:File/memory error(when training)?
|
88 |
+
Too many processes and your memory is not enough. You may fix it by:
|
89 |
+
|
90 |
+
1ใdecrease the input in field "Threads of CPU".
|
91 |
+
|
92 |
+
2ใpre-cut trainset to shorter audio files.
|
93 |
+
|
94 |
+
|
95 |
+
|
docs/training_tips_en.md
ADDED
@@ -0,0 +1,65 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Instructions and tips for RVC training
|
2 |
+
======================================
|
3 |
+
This TIPS explains how data training is done.
|
4 |
+
|
5 |
+
# Training flow
|
6 |
+
I will explain along the steps in the training tab of the GUI.
|
7 |
+
|
8 |
+
## step1
|
9 |
+
Set the experiment name here.
|
10 |
+
|
11 |
+
You can also set here whether the model should take pitch into account.
|
12 |
+
If the model doesn't consider pitch, the model will be lighter, but not suitable for singing.
|
13 |
+
|
14 |
+
Data for each experiment is placed in `/logs/your-experiment-name/`.
|
15 |
+
|
16 |
+
## step2a
|
17 |
+
Loads and preprocesses audio.
|
18 |
+
|
19 |
+
### load audio
|
20 |
+
If you specify a folder with audio, the audio files in that folder will be read automatically.
|
21 |
+
For example, if you specify `C:Users\hoge\voices`, `C:Users\hoge\voices\voice.mp3` will be loaded, but `C:Users\hoge\voices\dir\voice.mp3` will Not loaded.
|
22 |
+
|
23 |
+
Since ffmpeg is used internally for reading audio, if the extension is supported by ffmpeg, it will be read automatically.
|
24 |
+
After converting to int16 with ffmpeg, convert to float32 and normalize between -1 to 1.
|
25 |
+
|
26 |
+
### denoising
|
27 |
+
The audio is smoothed by scipy's filtfilt.
|
28 |
+
|
29 |
+
### Audio Split
|
30 |
+
First, the input audio is divided by detecting parts of silence that last longer than a certain period (max_sil_kept=5 seconds?). After splitting the audio on silence, split the audio every 4 seconds with an overlap of 0.3 seconds. For audio separated within 4 seconds, after normalizing the volume, convert the wav file to `/logs/your-experiment-name/0_gt_wavs` and then convert it to 16k sampling rate to `/logs/your-experiment-name/1_16k_wavs ` as a wav file.
|
31 |
+
|
32 |
+
## step2b
|
33 |
+
### Extract pitch
|
34 |
+
Extract pitch information from wav files. Extract the pitch information (=f0) using the method built into parselmouth or pyworld and save it in `/logs/your-experiment-name/2a_f0`. Then logarithmically convert the pitch information to an integer between 1 and 255 and save it in `/logs/your-experiment-name/2b-f0nsf`.
|
35 |
+
|
36 |
+
### Extract feature_print
|
37 |
+
Convert the wav file to embedding in advance using HuBERT. Read the wav file saved in `/logs/your-experiment-name/1_16k_wavs`, convert the wav file to 256-dimensional features with HuBERT, and save in npy format in `/logs/your-experiment-name/3_feature256`.
|
38 |
+
|
39 |
+
## step3
|
40 |
+
train the model.
|
41 |
+
### Glossary for Beginners
|
42 |
+
In deep learning, the data set is divided and the learning proceeds little by little. In one model update (step), batch_size data are retrieved and predictions and error corrections are performed. Doing this once for a dataset counts as one epoch.
|
43 |
+
|
44 |
+
Therefore, the learning time is the learning time per step x (the number of data in the dataset / batch size) x the number of epochs. In general, the larger the batch size, the more stable the learning becomes (learning time per step รท batch size) becomes smaller, but it uses more GPU memory. GPU RAM can be checked with the nvidia-smi command. Learning can be done in a short time by increasing the batch size as much as possible according to the machine of the execution environment.
|
45 |
+
|
46 |
+
### Specify pretrained model
|
47 |
+
RVC starts training the model from pretrained weights instead of from 0, so it can be trained with a small dataset.
|
48 |
+
|
49 |
+
By default
|
50 |
+
|
51 |
+
- If you consider pitch, it loads `rvc-location/pretrained/f0G40k.pth` and `rvc-location/pretrained/f0D40k.pth`.
|
52 |
+
- If you don't consider pitch, it loads `rvc-location/pretrained/f0G40k.pth` and `rvc-location/pretrained/f0D40k.pth`.
|
53 |
+
|
54 |
+
When learning, model parameters are saved in `logs/your-experiment-name/G_{}.pth` and `logs/your-experiment-name/D_{}.pth` for each save_every_epoch, but by specifying this path, you can start learning. You can restart or start training from model weights learned in a different experiment.
|
55 |
+
|
56 |
+
### learning index
|
57 |
+
RVC saves the HuBERT feature values used during training, and during inference, searches for feature values that are similar to the feature values used during learning to perform inference. In order to perform this search at high speed, the index is learned in advance.
|
58 |
+
For index learning, we use the approximate neighborhood search library faiss. Read the feature value of `logs/your-experiment-name/3_feature256` and use it to learn the index, and save it as `logs/your-experiment-name/add_XXX.index`.
|
59 |
+
|
60 |
+
(From the 20230428update version, it is read from the index, and saving / specifying is no longer necessary.)
|
61 |
+
|
62 |
+
### Button description
|
63 |
+
- Train model: After executing step2b, press this button to train the model.
|
64 |
+
- Train feature index: After training the model, perform index learning.
|
65 |
+
- One-click training: step2b, model training and feature index training all at once.
|
docs/training_tips_ja.md
ADDED
@@ -0,0 +1,64 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
RVCใฎ่จ็ทดใซใใใ่ชฌๆใใใใณTIPS
|
2 |
+
===============================
|
3 |
+
ๆฌTIPSใงใฏใฉใฎใใใซใใผใฟใฎ่จ็ทดใ่กใใใฆใใใใ่ชฌๆใใพใใ
|
4 |
+
|
5 |
+
# ่จ็ทดใฎๆตใ
|
6 |
+
GUIใฎ่จ็ทดใฟใใฎstepใซๆฒฟใฃใฆ่ชฌๆใใพใใ
|
7 |
+
|
8 |
+
## step1
|
9 |
+
ๅฎ้จๅใฎ่จญๅฎใ่กใใพใใ
|
10 |
+
|
11 |
+
ใพใใใขใใซใซ้ณ้ซใฌใคใ(ใใใ)ใ่ๆ
ฎใใใใใใใใง่จญๅฎใงใใพใใ่ๆ
ฎใใใชใๅ ดๅใฏใขใใซใฏ่ปฝ้ใซใชใใพใใใๆญๅฑใซใฏๅใใชใใชใใพใใ
|
12 |
+
|
13 |
+
ๅๅฎ้จใฎใใผใฟใฏ`/logs/ๅฎ้จๅ/`ใซ้
็ฝฎใใใพใใ
|
14 |
+
|
15 |
+
## step2a
|
16 |
+
้ณๅฃฐใฎ่ชญใฟ่พผใฟใจๅๅฆ็ใ่กใใพใใ
|
17 |
+
|
18 |
+
### load audio
|
19 |
+
้ณๅฃฐใฎใใใใฉใซใใๆๅฎใใใจใใใฎใใฉใซใๅ
ใซใใ้ณๅฃฐใใกใคใซใ่ชๅใง่ชญใฟ่พผใฟใพใใ
|
20 |
+
ไพใใฐ`C:Users\hoge\voices`ใๆๅฎใใๅ ดๅใ`C:Users\hoge\voices\voice.mp3`ใฏ่ชญใฟ่พผใพใใพใใใ`C:Users\hoge\voices\dir\voice.mp3`ใฏ่ชญใฟ่พผใพใใพใใใ
|
21 |
+
|
22 |
+
้ณๅฃฐใฎ่ชญใฟ่พผใฟใซใฏๅ
้จใงffmpegใๅฉ็จใใฆใใใฎใงใffmpegใงๅฏพๅฟใใฆใใๆกๅผตๅญใงใใใฐ่ชๅ็ใซ่ชญใฟ่พผใพใใพใใ
|
23 |
+
ffmpegใงint16ใซๅคๆใใๅพใfloat32ใซๅคๆใใ-1 ~ 1ใฎ้ใซๆญฃ่ฆๅใใใพใใ
|
24 |
+
|
25 |
+
### denoising
|
26 |
+
้ณๅฃฐใซใคใใฆscipyใฎfiltfiltใซใใๅนณๆปๅใ่กใใพใใ
|
27 |
+
|
28 |
+
### ้ณๅฃฐใฎๅๅฒ
|
29 |
+
ๅ
ฅๅใใ้ณๅฃฐใฏใพใใไธๅฎๆ้(max_sil_kept=5็ง?)ใใ้ทใ็ก้ณใ็ถใ้จๅใๆค็ฅใใฆ้ณๅฃฐใๅๅฒใใพใใ็ก้ณใง้ณๅฃฐใๅๅฒใใๅพใฏใ0.3็งใฎoverlapใๅซใ4็งใใจใซ้ณๅฃฐใๅๅฒใใพใใ4็งไปฅๅ
ใซๅบๅใใใ้ณๅฃฐใฏใ้ณ้ใฎๆญฃ่ฆๅใ่กใฃใๅพwavใใกใคใซใ`/logs/ๅฎ้จๅ/0_gt_wavs`ใซใใใใใ16kใฎใตใณใใชใณใฐใฌใผใใซๅคๆใใฆ`/logs/ๅฎ้จๅ/1_16k_wavs`ใซwavใใกใคใซใงไฟๅญใใพใใ
|
30 |
+
|
31 |
+
## step2b
|
32 |
+
### ใใใใฎๆฝๅบ
|
33 |
+
wavใใกใคใซใใใใใ(้ณใฎ้ซไฝ)ใฎๆ
ๅ ฑใๆฝๅบใใพใใparselmouthใpyworldใซๅ
่ตใใใฆใใๆๆณใงใใใๆ
ๅ ฑ(=f0)ใๆฝๅบใใ`/logs/ๅฎ้จๅ/2a_f0`ใซไฟๅญใใพใใใใฎๅพใใใใๆ
ๅ ฑใๅฏพๆฐใงๅคๆใใฆ1~255ใฎๆดๆฐใซๅคๆใใ`/logs/ๅฎ้จๅ/2b-f0nsf`ใซไฟๅญใใพใใ
|
34 |
+
|
35 |
+
### feature_printใฎๆฝๅบ
|
36 |
+
HuBERTใ็จใใฆwavใใกใคใซใไบๅใซembeddingใซๅคๆใใพใใ`/logs/ๅฎ้จๅ/1_16k_wavs`ใซไฟๅญใใwavใใกใคใซใ่ชญใฟ่พผใฟใHuBERTใงwavใใกใคใซใ256ๆฌกๅ
ใฎ็นๅพด้ใซๅคๆใใnpyๅฝขๅผใง`/logs/ๅฎ้จๅ/3_feature256`ใซไฟๅญใใพใใ
|
37 |
+
|
38 |
+
## step3
|
39 |
+
ใขใใซใฎใใฌใผใใณใฐใ่กใใพใใ
|
40 |
+
### ๅๅฟ่
ๅใ็จ่ช่งฃ่ชฌ
|
41 |
+
ๆทฑๅฑคๅญฆ็ฟใงใฏใใผใฟใปใใใๅๅฒใใๅฐใใใคๅญฆ็ฟใ้ฒใใฆใใใพใใไธๅใฎใขใใซใฎๆดๆฐ(step)ใงใฏใbatch_sizeๅใฎใใผใฟใๅใๅบใไบๆธฌใจ่ชคๅทฎใฎไฟฎๆญฃใ่กใใพใใใใใใใผใฟใปใใใซๅฏพใใฆไธ้ใ่กใใจไธepochใจๆฐใใพใใ
|
42 |
+
|
43 |
+
ใใฎใใใๅญฆ็ฟๆ้ใฏ 1stepๅฝใใใฎๅญฆ็ฟๆ้ x (ใใผใฟใปใใๅ
ใฎใใผใฟๆฐ รท ใใใใตใคใบ) x epochๆฐ ใใใใพใใไธ่ฌใซใใใใตใคใบใๅคงใใใใใปใฉๅญฆ็ฟใฏๅฎๅฎใใ(1stepๅฝใใใฎๅญฆ็ฟๆ้รทใใใใตใคใบ)ใฏๅฐใใใชใใพใใใใใฎๅGPUใฎใกใขใชใๅคใไฝฟ็จใใพใใGPUใฎRAMใฏnvidia-smiใณใใณใ็ญใง็ขบ่ชใงใใพใใๅฎ่ก็ฐๅขใฎใใทใณใซๅใใใฆใใใใตใคใบใใงใใใ ใๅคงใใใใใจใใ็ญๆ้ใงๅญฆ็ฟใๅฏ่ฝใงใใ
|
44 |
+
|
45 |
+
### pretrained modelใฎๆๅฎ
|
46 |
+
RVCใงใฏใขใใซใฎ่จ็ทดใ0ใใใงใฏใชใใไบๅๅญฆ็ฟๆธใฟใฎ้ใฟใใ้ๅงใใใใใๅฐใชใใใผใฟใปใใใงๅญฆ็ฟใ่กใใพใใ
|
47 |
+
|
48 |
+
ใใใฉใซใใงใฏ
|
49 |
+
|
50 |
+
- ้ณ้ซใฌใคใใ่ๆ
ฎใใๅ ดๅใ`RVCใฎใใๅ ดๆ/pretrained/f0G40k.pth`ใจ`RVCใฎใใๅ ดๆ/pretrained/f0D40k.pth`ใ่ชญใฟ่พผใฟใพใใ
|
51 |
+
- ้ณ้ซใฌใคใใ่ๆ
ฎใใชใๅ ดๅใ`RVCใฎใใๅ ดๆ/pretrained/G40k.pth`ใจ`RVCใฎใใๅ ดๆ/pretrained/D40k.pth`ใ่ชญใฟ่พผใฟใพใใ
|
52 |
+
|
53 |
+
ๅญฆ็ฟๆใฏsave_every_epochใใจใซใขใใซใฎใใฉใกใผใฟใ`logs/ๅฎ้จๅ/G_{}.pth`ใจ`logs/ๅฎ้จๅ/D_{}.pth`ใซไฟๅญใใใพใใใใใฎใในใๆๅฎใใใใจใงๅญฆ็ฟใๅ้ใใใใใใใใฏ้ใๅฎ้จใงๅญฆ็ฟใใใขใใซใฎ้ใฟใใๅญฆ็ฟใ้ๅงใงใใพใใ
|
54 |
+
|
55 |
+
### indexใฎๅญฆ็ฟ
|
56 |
+
RVCใงใฏๅญฆ็ฟๆใซไฝฟใใใHuBERTใฎ็นๅพด้ใไฟๅญใใๆจ่ซๆใฏๅญฆ็ฟๆใฎ็นๅพด้ใใ่ฟใ็นๅพด้ใๆขใใฆใใฆๆจ่ซใ่กใใพใใใใฎๆค็ดขใ้ซ้ใซ่กใใใใซไบๅใซindexใฎๅญฆ็ฟใ่กใใพใใ
|
57 |
+
indexใฎๅญฆ็ฟใซใฏ่ฟไผผ่ฟๅๆข็ดขใฉใคใใฉใชใฎfaissใ็จใใพใใ`/logs/ๅฎ้จๅ/3_feature256`ใฎ็นๅพด้ใ่ชญใฟ่พผใฟใใใใ็จใใฆๅญฆ็ฟใใindexใ`/logs/ๅฎ้จๅ/add_XXX.index`ใจใใฆไฟๅญใใพใใ
|
58 |
+
(20230428updateใใtotal_fea.npyใฏindexใใ่ชญใฟ่พผใใฎใงไธ่ฆใซใชใใพใใใ)
|
59 |
+
|
60 |
+
### ใใฟใณใฎ่ชฌๆ
|
61 |
+
- ใขใใซใฎใใฌใผใใณใฐ: step2bใพใงใๅฎ่กใใๅพใใใฎใใฟใณใๆผใใจใขใใซใฎๅญฆ็ฟใ่กใใพใใ
|
62 |
+
- ็นๅพดใคใณใใใฏในใฎใใฌใผใใณใฐ: ใขใใซใฎใใฌใผใใณใฐๅพใindexใฎๅญฆ็ฟใ่กใใพใใ
|
63 |
+
- ใฏใณใฏใชใใฏใใฌใผใใณใฐ: step2bใพใงใจใขใใซใฎใใฌใผใใณใฐใ็นๅพดใคใณใใใฏในใฎใใฌใผใใณใฐใไธๆฌใง่กใใพใใ
|
64 |
+
|
docs/training_tips_ko.md
ADDED
@@ -0,0 +1,53 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
RVC ํ๋ จ์ ๋ํ ์ค๋ช
๊ณผ ํ๋ค
|
2 |
+
======================================
|
3 |
+
๋ณธ ํ์์๋ ์ด๋ป๊ฒ ๋ฐ์ดํฐ ํ๋ จ์ด ์ด๋ฃจ์ด์ง๊ณ ์๋์ง ์ค๋ช
ํฉ๋๋ค.
|
4 |
+
|
5 |
+
# ํ๋ จ์ ํ๋ฆ
|
6 |
+
GUI์ ํ๋ จ ํญ์ ๋จ๊ณ๋ฅผ ๋ฐ๋ผ ์ค๋ช
ํฉ๋๋ค.
|
7 |
+
|
8 |
+
## step1
|
9 |
+
์คํ ์ด๋ฆ์ ์ง์ ํฉ๋๋ค. ๋ํ, ๋ชจ๋ธ์ด ํผ์น(์๋ฆฌ์ ๋๋ฎ์ด)๋ฅผ ๊ณ ๋ คํด์ผ ํ๋์ง ์ฌ๋ถ๋ฅผ ์ฌ๊ธฐ์์ ์ค์ ํ ์๋ ์์ต๋๋ค..
|
10 |
+
๊ฐ ์คํ์ ์ํ ๋ฐ์ดํฐ๋ `/logs/experiment name/`์ ๋ฐฐ์น๋ฉ๋๋ค..
|
11 |
+
|
12 |
+
## step2a
|
13 |
+
์์ฑ ํ์ผ์ ๋ถ๋ฌ์ค๊ณ ์ ์ฒ๋ฆฌํฉ๋๋ค.
|
14 |
+
|
15 |
+
### ์์ฑ ํ์ผ ๋ถ๋ฌ์ค๊ธฐ
|
16 |
+
์์ฑ ํ์ผ์ด ์๋ ํด๋๋ฅผ ์ง์ ํ๋ฉด ํด๋น ํด๋์ ์๋ ์์ฑ ํ์ผ์ด ์๋์ผ๋ก ๊ฐ์ ธ์์ง๋๋ค.
|
17 |
+
์๋ฅผ ๋ค์ด `C:Users\hoge\voices`๋ฅผ ์ง์ ํ๋ฉด `C:Users\hoge\voices\voice.mp3`๊ฐ ์ฝํ์ง๋ง `C:Users\hoge\voices\dir\voice.mp3`๋ ์ฝํ์ง ์์ต๋๋ค.
|
18 |
+
|
19 |
+
์์ฑ ๋ก๋์๋ ๋ด๋ถ์ ์ผ๋ก ffmpeg๋ฅผ ์ด์ฉํ๊ณ ์์ผ๋ฏ๋ก, ffmpeg๋ก ๋์ํ๊ณ ์๋ ํ์ฅ์๋ผ๋ฉด ์๋์ ์ผ๋ก ์ฝํ๋๋ค.
|
20 |
+
ffmpeg์์ int16์ผ๋ก ๋ณํํ ํ float32๋ก ๋ณํํ๊ณ -1๊ณผ 1 ์ฌ์ด์ ์ ๊ทํ๋ฉ๋๋ค.
|
21 |
+
|
22 |
+
### ์ก์ ์ ๊ฑฐ
|
23 |
+
์์ฑ ํ์ผ์ ๋ํด scipy์ filtfilt๋ฅผ ์ด์ฉํ์ฌ ์ก์์ ์ฒ๋ฆฌํฉ๋๋ค.
|
24 |
+
|
25 |
+
### ์์ฑ ๋ถํ
|
26 |
+
์
๋ ฅํ ์์ฑ ํ์ผ์ ๋จผ์ ์ผ์ ๊ธฐ๊ฐ(max_sil_kept=5์ด?)๋ณด๋ค ๊ธธ๊ฒ ๋ฌด์์ด ์ง์๋๋ ๋ถ๋ถ์ ๊ฐ์งํ์ฌ ์์ฑ์ ๋ถํ ํฉ๋๋ค.๋ฌด์์ผ๋ก ์์ฑ์ ๋ถํ ํ ํ์๋ 0.3์ด์ overlap์ ํฌํจํ์ฌ 4์ด๋ง๋ค ์์ฑ์ ๋ถํ ํฉ๋๋ค.4์ด ์ด๋ด์ ๊ตฌ๋ถ๋ ์์ฑ์ ์๋์ ์ ๊ทํ๋ฅผ ์ค์ํ ํ wav ํ์ผ์ `/logs/์คํ๋ช
/0_gt_wavs`๋ก, ๊ฑฐ๊ธฐ์์ 16k์ ์ํ๋ง ๋ ์ดํธ๋ก ๋ณํํด `/logs/์คํ๋ช
/1_16k_wavs`์ wav ํ์ผ๋ก ์ ์ฅํฉ๋๋ค.
|
27 |
+
|
28 |
+
## step2b
|
29 |
+
### ํผ์น ์ถ์ถ
|
30 |
+
wav ํ์ผ์์ ํผ์น(์๋ฆฌ์ ๋๋ฎ์ด) ์ ๋ณด๋ฅผ ์ถ์ถํฉ๋๋ค. parselmouth๋ pyworld์ ๋ด์ฅ๋์ด ์๋ ๋ฉ์๋์ผ๋ก ํผ์น ์ ๋ณด(=f0)๋ฅผ ์ถ์ถํด, `/logs/์คํ๋ช
/2a_f0`์ ์ ์ฅํฉ๋๋ค. ๊ทธ ํ ํผ์น ์ ๋ณด๋ฅผ ๋ก๊ทธ๋ก ๋ณํํ์ฌ 1~255 ์ ์๋ก ๋ณํํ๊ณ `/logs/์คํ๋ช
/2b-f0nsf`์ ์ ์ฅํฉ๋๋ค.
|
31 |
+
|
32 |
+
### feature_print ์ถ์ถ
|
33 |
+
HuBERT๋ฅผ ์ด์ฉํ์ฌ wav ํ์ผ์ ๋ฏธ๋ฆฌ embedding์ผ๋ก ๋ณํํฉ๋๋ค. `/logs/์คํ๋ช
/1_16k_wavs`์ ์ ์ฅํ wav ํ์ผ์ ์ฝ๊ณ HuBERT์์ wav ํ์ผ์ 256์ฐจ์ feature๋ค๋ก ๋ณํํ ํ npy ํ์์ผ๋ก `/logs/์คํ๋ช
/3_feature256`์ ์ ์ฅํฉ๋๋ค.
|
34 |
+
|
35 |
+
## step3
|
36 |
+
๋ชจ๋ธ์ ํ๋ จ์ ์งํํฉ๋๋ค.
|
37 |
+
|
38 |
+
### ์ด๋ณด์์ฉ ์ฉ์ด ํด์ค
|
39 |
+
์ฌ์ธตํ์ต(๋ฅ๋ฌ๋)์์๋ ๋ฐ์ดํฐ์
์ ๋ถํ ํ์ฌ ์กฐ๊ธ์ฉ ํ์ต์ ์งํํฉ๋๋ค.ํ ๋ฒ์ ๋ชจ๋ธ ์
๋ฐ์ดํธ(step) ๋จ๊ณ ๋น batch_size๊ฐ์ ๋ฐ์ดํฐ๋ฅผ ํ์ํ์ฌ ์์ธก๊ณผ ์ค์ฐจ๋ฅผ ์์ ํฉ๋๋ค. ๋ฐ์ดํฐ์
์ ๋ถ์ ๋ํด ์ด ์์
์ ํ ๋ฒ ์ํํ๋ ์ด๋ฅผ ํ๋์ epoch๋ผ๊ณ ๊ณ์ฐํฉ๋๋ค.
|
40 |
+
|
41 |
+
๋ฐ๋ผ์ ํ์ต ์๊ฐ์ ๋จ๊ณ๋น ํ์ต ์๊ฐ x (๋ฐ์ดํฐ์
๋ด ๋ฐ์ดํฐ์ ์ / batch size) x epoch ์๊ฐ ์์๋ฉ๋๋ค. ์ผ๋ฐ์ ์ผ๋ก batch size๊ฐ ํด์๋ก ํ์ต์ด ์์ ์ ์ด๊ฒ ๋ฉ๋๋ค. (step๋น ํ์ต ์๊ฐ รท batch size)๋ ์์์ง์ง๋ง GPU ๋ฉ๋ชจ๋ฆฌ๋ฅผ ๋ ๋ง์ด ์ฌ์ฉํฉ๋๋ค. GPU RAM์ nvidia-smi ๋ช
๋ น์ด๋ฅผ ํตํด ํ์ธํ ์ ์์ต๋๋ค. ์คํ ํ๊ฒฝ์ ๋ฐ๋ผ ๋ฐฐ์น ํฌ๊ธฐ๋ฅผ ์ต๋ํ ๋๋ฆฌ๋ฉด ์งง์ ์๊ฐ ๋ด์ ํ์ต์ด ๊ฐ๋ฅํฉ๋๋ค.
|
42 |
+
|
43 |
+
### ์ฌ์ ํ์ต๋ ๋ชจ๋ธ ์ง์
|
44 |
+
RVC๋ ์ ์ ๋ฐ์ดํฐ์
์ผ๋ก๋ ํ๋ จ์ด ๊ฐ๋ฅํ๋๋ก ์ฌ์ ํ๋ จ๋ ๊ฐ์ค์น์์ ๋ชจ๋ธ ํ๋ จ์ ์์ํฉ๋๋ค. ๊ธฐ๋ณธ์ ์ผ๋ก `rvc-location/pretrained/f0G40k.pth` ๋ฐ `rvc-location/pretrained/f0D40k.pth`๋ฅผ ๋ถ๋ฌ์ต๋๋ค. ํ์ต์ ํ ์์, ๋ชจ๋ธ ํ๋ผ๋ฏธํฐ๋ ๊ฐ save_every_epoch๋ณ๋ก `logs/experiment name/G_{}.pth` ์ `logs/experiment name/D_{}.pth`๋ก ์ ์ฅ์ด ๋๋๋ฐ, ์ด ๊ฒฝ๋ก๋ฅผ ์ง์ ํจ์ผ๋ก์จ ํ์ต์ ์ฌ๊ฐํ๊ฑฐ๋, ๋ค๋ฅธ ์คํ์์ ํ์ตํ ๋ชจ๋ธ์ ๊ฐ์ค์น์์ ํ์ต์ ์์ํ ์ ์์ต๋๋ค.
|
45 |
+
|
46 |
+
### index์ ํ์ต
|
47 |
+
RVC์์๋ ํ์ต์์ ์ฌ์ฉ๋ HuBERT์ feature๊ฐ์ ์ ์ฅํ๊ณ , ์ถ๋ก ์์๋ ํ์ต ์ ์ฌ์ฉํ feature๊ฐ๊ณผ ์ ์ฌํ feature ๊ฐ์ ํ์ํด ์ถ๋ก ์ ์งํํฉ๋๋ค. ์ด ํ์์ ๊ณ ์์ผ๋ก ์ํํ๊ธฐ ์ํด ์ฌ์ ์ index์ ํ์ตํ๊ฒ ๋ฉ๋๋ค.
|
48 |
+
Index ํ์ต์๋ ๊ทผ์ฌ ๊ทผ์ ํ์๋ฒ ๋ผ์ด๋ธ๋ฌ๋ฆฌ์ธ Faiss๋ฅผ ์ฌ์ฉํ๊ฒ ๋ฉ๋๋ค. `/logs/์คํ๋ช
/3_feature256`์ feature๊ฐ์ ๋ถ๋ฌ์, ์ด๋ฅผ ๋ชจ๋ ๊ฒฐํฉ์ํจ feature๊ฐ์ `/logs/์คํ๋ช
/total_fea.npy`๋ก์ ์ ์ฅ, ๊ทธ๊ฒ์ ์ฌ์ฉํด ํ์ตํ index๋ฅผ`/logs/์คํ๋ช
/add_XXX.index`๋ก ์ ์ฅํฉ๋๋ค.
|
49 |
+
|
50 |
+
### ๋ฒํผ ์ค๋ช
|
51 |
+
- ใขใใซใฎใใฌใผใใณใฐ (๋ชจ๋ธ ํ์ต): step2b๊น์ง ์คํํ ํ, ์ด ๋ฒํผ์ ๋๋ฌ ๋ชจ๋ธ์ ํ์ตํฉ๋๋ค.
|
52 |
+
- ็นๅพดใคใณใใใฏในใฎใใฌใผใใณใฐ (ํน์ง ์ง์ ํ๋ จ): ๋ชจ๋ธ์ ํ๋ จ ํ, index๋ฅผ ํ์ตํฉ๋๋ค.
|
53 |
+
- ใฏใณใฏใชใใฏใใฌใผใใณใฐ (์ํด๋ฆญ ํธ๋ ์ด๋): step2b๊น์ง์ ๋ชจ๋ธ ํ๋ จ, feature index ํ๋ จ์ ์ผ๊ด๋ก ์ค์ํฉ๋๋ค.
|
docs/ๅฐ็ฝ็ฎๆๆ็จ.doc
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:a6def6895e9f7a9bb9a852fbca05f001c77bb98338b687744142e45f014b9a17
|
3 |
+
size 602624
|