PhoenixStormJr commited on
Commit
602c5ad
·
verified ·
1 Parent(s): 5f56116

Upload infer-web.py with huggingface_hub

Browse files
Files changed (1) hide show
  1. infer-web.py +2471 -0
infer-web.py ADDED
@@ -0,0 +1,2471 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import shutil
3
+ import sys
4
+
5
+ import json # Mangio fork using json for preset saving
6
+
7
+ now_dir = os.getcwd()
8
+ sys.path.append(now_dir)
9
+ import traceback, pdb
10
+ import warnings
11
+
12
+ import numpy as np
13
+ import torch
14
+
15
+ os.environ["no_proxy"] = "localhost, 127.0.0.1, ::1"
16
+ import logging
17
+ import threading
18
+ from random import shuffle
19
+ from subprocess import Popen
20
+ from time import sleep
21
+
22
+ import faiss
23
+ import ffmpeg
24
+ import gradio as gr
25
+ import soundfile as sf
26
+ from config import Config
27
+ from fairseq import checkpoint_utils
28
+ from i18n import I18nAuto
29
+ from infer_pack.models import (
30
+ SynthesizerTrnMs256NSFsid,
31
+ SynthesizerTrnMs256NSFsid_nono,
32
+ SynthesizerTrnMs768NSFsid,
33
+ SynthesizerTrnMs768NSFsid_nono,
34
+ )
35
+ from infer_pack.models_onnx import SynthesizerTrnMsNSFsidM
36
+ from infer_uvr5 import _audio_pre_, _audio_pre_new
37
+ from MDXNet import MDXNetDereverb
38
+ from my_utils import load_audio
39
+ from train.process_ckpt import change_info, extract_small_model, merge, show_info
40
+ from vc_infer_pipeline import VC
41
+ from sklearn.cluster import MiniBatchKMeans
42
+
43
+ logging.getLogger("numba").setLevel(logging.WARNING)
44
+
45
+
46
+ tmp = os.path.join(now_dir, "TEMP")
47
+ shutil.rmtree(tmp, ignore_errors=True)
48
+ shutil.rmtree("%s/runtime/Lib/site-packages/infer_pack" % (now_dir), ignore_errors=True)
49
+ shutil.rmtree("%s/runtime/Lib/site-packages/uvr5_pack" % (now_dir), ignore_errors=True)
50
+ os.makedirs(tmp, exist_ok=True)
51
+ os.makedirs(os.path.join(now_dir, "logs"), exist_ok=True)
52
+ os.makedirs(os.path.join(now_dir, "weights"), exist_ok=True)
53
+ os.environ["TEMP"] = tmp
54
+ warnings.filterwarnings("ignore")
55
+ torch.manual_seed(114514)
56
+
57
+
58
+ config = Config()
59
+ i18n = I18nAuto()
60
+ i18n.print()
61
+ # 判断是否有能用来训练和加速推理的N卡
62
+ ngpu = torch.cuda.device_count()
63
+ gpu_infos = []
64
+ mem = []
65
+ if_gpu_ok = False
66
+
67
+ if torch.cuda.is_available() or ngpu != 0:
68
+ for i in range(ngpu):
69
+ gpu_name = torch.cuda.get_device_name(i)
70
+ if any(
71
+ value in gpu_name.upper()
72
+ for value in [
73
+ "10",
74
+ "16",
75
+ "20",
76
+ "30",
77
+ "40",
78
+ "A2",
79
+ "A3",
80
+ "A4",
81
+ "P4",
82
+ "A50",
83
+ "500",
84
+ "A60",
85
+ "70",
86
+ "80",
87
+ "90",
88
+ "M4",
89
+ "T4",
90
+ "TITAN",
91
+ ]
92
+ ):
93
+ # A10#A100#V100#A40#P40#M40#K80#A4500
94
+ if_gpu_ok = True # 至少有一张能用的N卡
95
+ gpu_infos.append("%s\t%s" % (i, gpu_name))
96
+ mem.append(
97
+ int(
98
+ torch.cuda.get_device_properties(i).total_memory
99
+ / 1024
100
+ / 1024
101
+ / 1024
102
+ + 0.4
103
+ )
104
+ )
105
+ if if_gpu_ok and len(gpu_infos) > 0:
106
+ gpu_info = "\n".join(gpu_infos)
107
+ default_batch_size = min(mem) // 2
108
+ else:
109
+ gpu_info = i18n("很遗憾您这没有能用的显卡来支持您训练")
110
+ default_batch_size = 1
111
+ gpus = "-".join([i[0] for i in gpu_infos])
112
+
113
+
114
+ class ToolButton(gr.Button, gr.components.FormComponent):
115
+ """Small button with single emoji as text, fits inside gradio forms"""
116
+
117
+ def __init__(self, **kwargs):
118
+ super().__init__(variant="tool", **kwargs)
119
+
120
+ def get_block_name(self):
121
+ return "button"
122
+
123
+
124
+ hubert_model = None
125
+
126
+
127
+ def load_hubert():
128
+ global hubert_model
129
+ models, _, _ = checkpoint_utils.load_model_ensemble_and_task(
130
+ ["hubert_base.pt"],
131
+ suffix="",
132
+ )
133
+ hubert_model = models[0]
134
+ hubert_model = hubert_model.to(config.device)
135
+ if config.is_half:
136
+ hubert_model = hubert_model.half()
137
+ else:
138
+ hubert_model = hubert_model.float()
139
+ hubert_model.eval()
140
+
141
+
142
+ weight_root = "weights"
143
+ weight_uvr5_root = "uvr5_weights"
144
+ index_root = "logs"
145
+ names = []
146
+ for name in os.listdir(weight_root):
147
+ if name.endswith(".pth"):
148
+ names.append(name)
149
+ index_paths = []
150
+ for root, dirs, files in os.walk(index_root, topdown=False):
151
+ for name in files:
152
+ if name.endswith(".index") and "trained" not in name:
153
+ index_paths.append("%s/%s" % (root, name))
154
+ uvr5_names = []
155
+ for name in os.listdir(weight_uvr5_root):
156
+ if name.endswith(".pth") or "onnx" in name:
157
+ uvr5_names.append(name.replace(".pth", ""))
158
+
159
+
160
+ def vc_single(
161
+ sid,
162
+ input_audio_path,
163
+ f0_up_key,
164
+ f0_file,
165
+ f0_method,
166
+ file_index,
167
+ file_index2,
168
+ # file_big_npy,
169
+ index_rate,
170
+ filter_radius,
171
+ resample_sr,
172
+ rms_mix_rate,
173
+ protect,
174
+ crepe_hop_length,
175
+ ): # spk_item, input_audio0, vc_transform0,f0_file,f0method0
176
+ global tgt_sr, net_g, vc, hubert_model, version
177
+ if input_audio_path is None:
178
+ return "You need to upload an audio", None
179
+ f0_up_key = int(f0_up_key)
180
+ try:
181
+ audio = load_audio(input_audio_path, 16000)
182
+ audio_max = np.abs(audio).max() / 0.95
183
+ if audio_max > 1:
184
+ audio /= audio_max
185
+ times = [0, 0, 0]
186
+ if not hubert_model:
187
+ load_hubert()
188
+ if_f0 = cpt.get("f0", 1)
189
+ file_index = (
190
+ (
191
+ file_index.strip(" ")
192
+ .strip('"')
193
+ .strip("\n")
194
+ .strip('"')
195
+ .strip(" ")
196
+ .replace("trained", "added")
197
+ )
198
+ if file_index != ""
199
+ else file_index2
200
+ ) # 防止小白写错,自动帮他替换掉
201
+ # file_big_npy = (
202
+ # file_big_npy.strip(" ").strip('"').strip("\n").strip('"').strip(" ")
203
+ # )
204
+ audio_opt = vc.pipeline(
205
+ hubert_model,
206
+ net_g,
207
+ sid,
208
+ audio,
209
+ input_audio_path,
210
+ times,
211
+ f0_up_key,
212
+ f0_method,
213
+ file_index,
214
+ # file_big_npy,
215
+ index_rate,
216
+ if_f0,
217
+ filter_radius,
218
+ tgt_sr,
219
+ resample_sr,
220
+ rms_mix_rate,
221
+ version,
222
+ protect,
223
+ crepe_hop_length,
224
+ f0_file=f0_file,
225
+ )
226
+ if tgt_sr != resample_sr >= 16000:
227
+ tgt_sr = resample_sr
228
+ index_info = (
229
+ "Using index:%s." % file_index
230
+ if os.path.exists(file_index)
231
+ else "Index not used."
232
+ )
233
+ return "Success.\n %s\nTime:\n npy:%ss, f0:%ss, infer:%ss" % (
234
+ index_info,
235
+ times[0],
236
+ times[1],
237
+ times[2],
238
+ ), (tgt_sr, audio_opt)
239
+ except:
240
+ info = traceback.format_exc()
241
+ print(info)
242
+ return info, (None, None)
243
+
244
+
245
+ def vc_multi(
246
+ sid,
247
+ dir_path,
248
+ opt_root,
249
+ paths,
250
+ f0_up_key,
251
+ f0_method,
252
+ file_index,
253
+ file_index2,
254
+ # file_big_npy,
255
+ index_rate,
256
+ filter_radius,
257
+ resample_sr,
258
+ rms_mix_rate,
259
+ protect,
260
+ format1,
261
+ crepe_hop_length,
262
+ ):
263
+ try:
264
+ dir_path = (
265
+ dir_path.strip(" ").strip('"').strip("\n").strip('"').strip(" ")
266
+ ) # 防止小白拷路径头尾带了空格和"和回车
267
+ opt_root = opt_root.strip(" ").strip('"').strip("\n").strip('"').strip(" ")
268
+ os.makedirs(opt_root, exist_ok=True)
269
+ try:
270
+ if dir_path != "":
271
+ paths = [os.path.join(dir_path, name) for name in os.listdir(dir_path)]
272
+ else:
273
+ paths = [path.name for path in paths]
274
+ except:
275
+ traceback.print_exc()
276
+ paths = [path.name for path in paths]
277
+ infos = []
278
+ for path in paths:
279
+ info, opt = vc_single(
280
+ sid,
281
+ path,
282
+ f0_up_key,
283
+ None,
284
+ f0_method,
285
+ file_index,
286
+ file_index2,
287
+ # file_big_npy,
288
+ index_rate,
289
+ filter_radius,
290
+ resample_sr,
291
+ rms_mix_rate,
292
+ protect,
293
+ crepe_hop_length
294
+ )
295
+ if "Success" in info:
296
+ try:
297
+ tgt_sr, audio_opt = opt
298
+ if format1 in ["wav", "flac"]:
299
+ sf.write(
300
+ "%s/%s.%s" % (opt_root, os.path.basename(path), format1),
301
+ audio_opt,
302
+ tgt_sr,
303
+ )
304
+ else:
305
+ path = "%s/%s.wav" % (opt_root, os.path.basename(path))
306
+ sf.write(
307
+ path,
308
+ audio_opt,
309
+ tgt_sr,
310
+ )
311
+ if os.path.exists(path):
312
+ os.system(
313
+ "ffmpeg -i %s -vn %s -q:a 2 -y"
314
+ % (path, path[:-4] + ".%s" % format1)
315
+ )
316
+ except:
317
+ info += traceback.format_exc()
318
+ infos.append("%s->%s" % (os.path.basename(path), info))
319
+ yield "\n".join(infos)
320
+ yield "\n".join(infos)
321
+ except:
322
+ yield traceback.format_exc()
323
+
324
+
325
+ def uvr(model_name, inp_root, save_root_vocal, paths, save_root_ins, agg, format0):
326
+ infos = []
327
+ try:
328
+ inp_root = inp_root.strip(" ").strip('"').strip("\n").strip('"').strip(" ")
329
+ save_root_vocal = (
330
+ save_root_vocal.strip(" ").strip('"').strip("\n").strip('"').strip(" ")
331
+ )
332
+ save_root_ins = (
333
+ save_root_ins.strip(" ").strip('"').strip("\n").strip('"').strip(" ")
334
+ )
335
+ if model_name == "onnx_dereverb_By_FoxJoy":
336
+ pre_fun = MDXNetDereverb(15)
337
+ else:
338
+ func = _audio_pre_ if "DeEcho" not in model_name else _audio_pre_new
339
+ pre_fun = func(
340
+ agg=int(agg),
341
+ model_path=os.path.join(weight_uvr5_root, model_name + ".pth"),
342
+ device=config.device,
343
+ is_half=config.is_half,
344
+ )
345
+ if inp_root != "":
346
+ paths = [os.path.join(inp_root, name) for name in os.listdir(inp_root)]
347
+ else:
348
+ paths = [path.name for path in paths]
349
+ for path in paths:
350
+ inp_path = os.path.join(inp_root, path)
351
+ need_reformat = 1
352
+ done = 0
353
+ try:
354
+ info = ffmpeg.probe(inp_path, cmd="ffprobe")
355
+ if (
356
+ info["streams"][0]["channels"] == 2
357
+ and info["streams"][0]["sample_rate"] == "44100"
358
+ ):
359
+ need_reformat = 0
360
+ pre_fun._path_audio_(
361
+ inp_path, save_root_ins, save_root_vocal, format0
362
+ )
363
+ done = 1
364
+ except:
365
+ need_reformat = 1
366
+ traceback.print_exc()
367
+ if need_reformat == 1:
368
+ tmp_path = "%s/%s.reformatted.wav" % (tmp, os.path.basename(inp_path))
369
+ os.system(
370
+ "ffmpeg -i %s -vn -acodec pcm_s16le -ac 2 -ar 44100 %s -y"
371
+ % (inp_path, tmp_path)
372
+ )
373
+ inp_path = tmp_path
374
+ try:
375
+ if done == 0:
376
+ pre_fun._path_audio_(
377
+ inp_path, save_root_ins, save_root_vocal, format0
378
+ )
379
+ infos.append("%s->Success" % (os.path.basename(inp_path)))
380
+ yield "\n".join(infos)
381
+ except:
382
+ infos.append(
383
+ "%s->%s" % (os.path.basename(inp_path), traceback.format_exc())
384
+ )
385
+ yield "\n".join(infos)
386
+ except:
387
+ infos.append(traceback.format_exc())
388
+ yield "\n".join(infos)
389
+ finally:
390
+ try:
391
+ if model_name == "onnx_dereverb_By_FoxJoy":
392
+ del pre_fun.pred.model
393
+ del pre_fun.pred.model_
394
+ else:
395
+ del pre_fun.model
396
+ del pre_fun
397
+ except:
398
+ traceback.print_exc()
399
+ print("clean_empty_cache")
400
+ if torch.cuda.is_available():
401
+ torch.cuda.empty_cache()
402
+ yield "\n".join(infos)
403
+
404
+
405
+ # 一个选项卡全局只能有一个音色
406
+ def get_vc(sid, to_return_protect0, to_return_protect1):
407
+ global n_spk, tgt_sr, net_g, vc, cpt, version
408
+ if sid == "" or sid == []:
409
+ global hubert_model
410
+ if hubert_model is not None: # 考虑到轮询, 需要加个判断看是否 sid 是由有模型切换到无模型的
411
+ print("clean_empty_cache")
412
+ del net_g, n_spk, vc, hubert_model, tgt_sr # ,cpt
413
+ hubert_model = net_g = n_spk = vc = hubert_model = tgt_sr = None
414
+ if torch.cuda.is_available():
415
+ torch.cuda.empty_cache()
416
+ ###楼下不这么折腾清理不干净
417
+ if_f0 = cpt.get("f0", 1)
418
+ version = cpt.get("version", "v1")
419
+ if version == "v1":
420
+ if if_f0 == 1:
421
+ net_g = SynthesizerTrnMs256NSFsid(
422
+ *cpt["config"], is_half=config.is_half
423
+ )
424
+ else:
425
+ net_g = SynthesizerTrnMs256NSFsid_nono(*cpt["config"])
426
+ elif version == "v2":
427
+ if if_f0 == 1:
428
+ net_g = SynthesizerTrnMs768NSFsid(
429
+ *cpt["config"], is_half=config.is_half
430
+ )
431
+ else:
432
+ net_g = SynthesizerTrnMs768NSFsid_nono(*cpt["config"])
433
+ del net_g, cpt
434
+ if torch.cuda.is_available():
435
+ torch.cuda.empty_cache()
436
+ cpt = None
437
+ return {"visible": False, "__type__": "update"}
438
+ person = "%s/%s" % (weight_root, sid)
439
+ print("loading %s" % person)
440
+ cpt = torch.load(person, map_location="cpu")
441
+ tgt_sr = cpt["config"][-1]
442
+ cpt["config"][-3] = cpt["weight"]["emb_g.weight"].shape[0] # n_spk
443
+ if_f0 = cpt.get("f0", 1)
444
+ if if_f0 == 0:
445
+ to_return_protect0 = to_return_protect1 = {
446
+ "visible": False,
447
+ "value": 0.5,
448
+ "__type__": "update",
449
+ }
450
+ else:
451
+ to_return_protect0 = {
452
+ "visible": True,
453
+ "value": to_return_protect0,
454
+ "__type__": "update",
455
+ }
456
+ to_return_protect1 = {
457
+ "visible": True,
458
+ "value": to_return_protect1,
459
+ "__type__": "update",
460
+ }
461
+ version = cpt.get("version", "v1")
462
+ if version == "v1":
463
+ if if_f0 == 1:
464
+ net_g = SynthesizerTrnMs256NSFsid(*cpt["config"], is_half=config.is_half)
465
+ else:
466
+ net_g = SynthesizerTrnMs256NSFsid_nono(*cpt["config"])
467
+ elif version == "v2":
468
+ if if_f0 == 1:
469
+ net_g = SynthesizerTrnMs768NSFsid(*cpt["config"], is_half=config.is_half)
470
+ else:
471
+ net_g = SynthesizerTrnMs768NSFsid_nono(*cpt["config"])
472
+ del net_g.enc_q
473
+ print(net_g.load_state_dict(cpt["weight"], strict=False))
474
+ net_g.eval().to(config.device)
475
+ if config.is_half:
476
+ net_g = net_g.half()
477
+ else:
478
+ net_g = net_g.float()
479
+ vc = VC(tgt_sr, config)
480
+ n_spk = cpt["config"][-3]
481
+ return (
482
+ {"visible": True, "maximum": n_spk, "__type__": "update"},
483
+ to_return_protect0,
484
+ to_return_protect1,
485
+ )
486
+
487
+
488
+ def change_choices():
489
+ names = []
490
+ for name in os.listdir(weight_root):
491
+ if name.endswith(".pth"):
492
+ names.append(name)
493
+ index_paths = []
494
+ for root, dirs, files in os.walk(index_root, topdown=False):
495
+ for name in files:
496
+ if name.endswith(".index") and "trained" not in name:
497
+ index_paths.append("%s/%s" % (root, name))
498
+ return {"choices": sorted(names), "__type__": "update"}, {
499
+ "choices": sorted(index_paths),
500
+ "__type__": "update",
501
+ }
502
+
503
+
504
+ def clean():
505
+ return {"value": "", "__type__": "update"}
506
+
507
+
508
+ sr_dict = {
509
+ "32k": 32000,
510
+ "40k": 40000,
511
+ "48k": 48000,
512
+ }
513
+
514
+
515
+ def if_done(done, p):
516
+ while 1:
517
+ if p.poll() is None:
518
+ sleep(0.5)
519
+ else:
520
+ break
521
+ done[0] = True
522
+
523
+
524
+ def if_done_multi(done, ps):
525
+ while 1:
526
+ # poll==None代表进程未结束
527
+ # 只要有一个进程未结束都不停
528
+ flag = 1
529
+ for p in ps:
530
+ if p.poll() is None:
531
+ flag = 0
532
+ sleep(0.5)
533
+ break
534
+ if flag == 1:
535
+ break
536
+ done[0] = True
537
+
538
+
539
+ def preprocess_dataset(trainset_dir, exp_dir, sr, n_p):
540
+ sr = sr_dict[sr]
541
+ os.makedirs("%s/logs/%s" % (now_dir, exp_dir), exist_ok=True)
542
+ f = open("%s/logs/%s/preprocess.log" % (now_dir, exp_dir), "w")
543
+ f.close()
544
+ cmd = (
545
+ config.python_cmd
546
+ + " trainset_preprocess_pipeline_print.py %s %s %s %s/logs/%s "
547
+ % (trainset_dir, sr, n_p, now_dir, exp_dir)
548
+ + str(config.noparallel)
549
+ )
550
+ print(cmd)
551
+ p = Popen(cmd, shell=True) # , stdin=PIPE, stdout=PIPE,stderr=PIPE,cwd=now_dir
552
+ ###煞笔gr, popen read都非得全跑完了再一次性读取, 不用gr就正常读一句输出一句;只能额外弄出一个文本流定时读
553
+ done = [False]
554
+ threading.Thread(
555
+ target=if_done,
556
+ args=(
557
+ done,
558
+ p,
559
+ ),
560
+ ).start()
561
+ while 1:
562
+ with open("%s/logs/%s/preprocess.log" % (now_dir, exp_dir), "r") as f:
563
+ yield (f.read())
564
+ sleep(1)
565
+ if done[0]:
566
+ break
567
+ with open("%s/logs/%s/preprocess.log" % (now_dir, exp_dir), "r") as f:
568
+ log = f.read()
569
+ print(log)
570
+ yield log
571
+
572
+
573
+ # but2.click(extract_f0,[gpus6,np7,f0method8,if_f0_3,trainset_dir4],[info2])
574
+ def extract_f0_feature(gpus, n_p, f0method, if_f0, exp_dir, version19, echl):
575
+ gpus = gpus.split("-")
576
+ os.makedirs("%s/logs/%s" % (now_dir, exp_dir), exist_ok=True)
577
+ f = open("%s/logs/%s/extract_f0_feature.log" % (now_dir, exp_dir), "w")
578
+ f.close()
579
+ if if_f0:
580
+ cmd = config.python_cmd + " extract_f0_print.py %s/logs/%s %s %s %s" % (
581
+ now_dir,
582
+ exp_dir,
583
+ n_p,
584
+ f0method,
585
+ echl,
586
+ )
587
+ print(cmd)
588
+ p = Popen(cmd, shell=True, cwd=now_dir) # , stdin=PIPE, stdout=PIPE,stderr=PIPE
589
+ ###煞笔gr, popen read都非得全跑完了再一次性读取, 不用gr就正常读一句输出一句;只能额外弄出一个文本流定时读
590
+ done = [False]
591
+ threading.Thread(
592
+ target=if_done,
593
+ args=(
594
+ done,
595
+ p,
596
+ ),
597
+ ).start()
598
+ while 1:
599
+ with open(
600
+ "%s/logs/%s/extract_f0_feature.log" % (now_dir, exp_dir), "r"
601
+ ) as f:
602
+ yield (f.read())
603
+ sleep(1)
604
+ if done[0]:
605
+ break
606
+ with open("%s/logs/%s/extract_f0_feature.log" % (now_dir, exp_dir), "r") as f:
607
+ log = f.read()
608
+ print(log)
609
+ yield log
610
+ ####对不同part分别开多进程
611
+ """
612
+ n_part=int(sys.argv[1])
613
+ i_part=int(sys.argv[2])
614
+ i_gpu=sys.argv[3]
615
+ exp_dir=sys.argv[4]
616
+ os.environ["CUDA_VISIBLE_DEVICES"]=str(i_gpu)
617
+ """
618
+ leng = len(gpus)
619
+ ps = []
620
+ for idx, n_g in enumerate(gpus):
621
+ cmd = (
622
+ config.python_cmd
623
+ + " extract_feature_print.py %s %s %s %s %s/logs/%s %s"
624
+ % (
625
+ config.device,
626
+ leng,
627
+ idx,
628
+ n_g,
629
+ now_dir,
630
+ exp_dir,
631
+ version19,
632
+ )
633
+ )
634
+ print(cmd)
635
+ p = Popen(
636
+ cmd, shell=True, cwd=now_dir
637
+ ) # , shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE, cwd=now_dir
638
+ ps.append(p)
639
+ ###煞笔gr, popen read都非得全跑完了再一次性读取, 不用gr就正常读一句输出一句;只能额外弄出一个文本流定时读
640
+ done = [False]
641
+ threading.Thread(
642
+ target=if_done_multi,
643
+ args=(
644
+ done,
645
+ ps,
646
+ ),
647
+ ).start()
648
+ while 1:
649
+ with open("%s/logs/%s/extract_f0_feature.log" % (now_dir, exp_dir), "r") as f:
650
+ yield (f.read())
651
+ sleep(1)
652
+ if done[0]:
653
+ break
654
+ with open("%s/logs/%s/extract_f0_feature.log" % (now_dir, exp_dir), "r") as f:
655
+ log = f.read()
656
+ print(log)
657
+ yield log
658
+
659
+
660
+ def change_sr2(sr2, if_f0_3, version19):
661
+ path_str = "" if version19 == "v1" else "_v2"
662
+ f0_str = "f0" if if_f0_3 else ""
663
+ if_pretrained_generator_exist = os.access(
664
+ "pretrained%s/%sG%s.pth" % (path_str, f0_str, sr2), os.F_OK
665
+ )
666
+ if_pretrained_discriminator_exist = os.access(
667
+ "pretrained%s/%sD%s.pth" % (path_str, f0_str, sr2), os.F_OK
668
+ )
669
+ if not if_pretrained_generator_exist:
670
+ print(
671
+ "pretrained%s/%sG%s.pth" % (path_str, f0_str, sr2),
672
+ "not exist, will not use pretrained model",
673
+ )
674
+ if not if_pretrained_discriminator_exist:
675
+ print(
676
+ "pretrained%s/%sD%s.pth" % (path_str, f0_str, sr2),
677
+ "not exist, will not use pretrained model",
678
+ )
679
+ return (
680
+ "pretrained%s/%sG%s.pth" % (path_str, f0_str, sr2)
681
+ if if_pretrained_generator_exist
682
+ else "",
683
+ "pretrained%s/%sD%s.pth" % (path_str, f0_str, sr2)
684
+ if if_pretrained_discriminator_exist
685
+ else "",
686
+ )
687
+
688
+
689
+ def change_version19(sr2, if_f0_3, version19):
690
+ path_str = "" if version19 == "v1" else "_v2"
691
+ if sr2 == "32k" and version19 == "v1":
692
+ sr2 = "40k"
693
+ to_return_sr2 = (
694
+ {"choices": ["40k", "48k"], "__type__": "update", "value": sr2}
695
+ if version19 == "v1"
696
+ else {"choices": ["40k", "48k", "32k"], "__type__": "update", "value": sr2}
697
+ )
698
+ f0_str = "f0" if if_f0_3 else ""
699
+ if_pretrained_generator_exist = os.access(
700
+ "pretrained%s/%sG%s.pth" % (path_str, f0_str, sr2), os.F_OK
701
+ )
702
+ if_pretrained_discriminator_exist = os.access(
703
+ "pretrained%s/%sD%s.pth" % (path_str, f0_str, sr2), os.F_OK
704
+ )
705
+ if not if_pretrained_generator_exist:
706
+ print(
707
+ "pretrained%s/%sG%s.pth" % (path_str, f0_str, sr2),
708
+ "not exist, will not use pretrained model",
709
+ )
710
+ if not if_pretrained_discriminator_exist:
711
+ print(
712
+ "pretrained%s/%sD%s.pth" % (path_str, f0_str, sr2),
713
+ "not exist, will not use pretrained model",
714
+ )
715
+ return (
716
+ "pretrained%s/%sG%s.pth" % (path_str, f0_str, sr2)
717
+ if if_pretrained_generator_exist
718
+ else "",
719
+ "pretrained%s/%sD%s.pth" % (path_str, f0_str, sr2)
720
+ if if_pretrained_discriminator_exist
721
+ else "",
722
+ to_return_sr2,
723
+ )
724
+
725
+
726
+ def change_f0(if_f0_3, sr2, version19): # f0method8,pretrained_G14,pretrained_D15
727
+ path_str = "" if version19 == "v1" else "_v2"
728
+ if_pretrained_generator_exist = os.access(
729
+ "pretrained%s/f0G%s.pth" % (path_str, sr2), os.F_OK
730
+ )
731
+ if_pretrained_discriminator_exist = os.access(
732
+ "pretrained%s/f0D%s.pth" % (path_str, sr2), os.F_OK
733
+ )
734
+ if not if_pretrained_generator_exist:
735
+ print(
736
+ "pretrained%s/f0G%s.pth" % (path_str, sr2),
737
+ "not exist, will not use pretrained model",
738
+ )
739
+ if not if_pretrained_discriminator_exist:
740
+ print(
741
+ "pretrained%s/f0D%s.pth" % (path_str, sr2),
742
+ "not exist, will not use pretrained model",
743
+ )
744
+ if if_f0_3:
745
+ return (
746
+ {"visible": True, "__type__": "update"},
747
+ "pretrained%s/f0G%s.pth" % (path_str, sr2)
748
+ if if_pretrained_generator_exist
749
+ else "",
750
+ "pretrained%s/f0D%s.pth" % (path_str, sr2)
751
+ if if_pretrained_discriminator_exist
752
+ else "",
753
+ )
754
+ return (
755
+ {"visible": False, "__type__": "update"},
756
+ ("pretrained%s/G%s.pth" % (path_str, sr2))
757
+ if if_pretrained_generator_exist
758
+ else "",
759
+ ("pretrained%s/D%s.pth" % (path_str, sr2))
760
+ if if_pretrained_discriminator_exist
761
+ else "",
762
+ )
763
+
764
+
765
+ # but3.click(click_train,[exp_dir1,sr2,if_f0_3,save_epoch10,total_epoch11,batch_size12,if_save_latest13,pretrained_G14,pretrained_D15,gpus16])
766
+ def click_train(
767
+ exp_dir1,
768
+ sr2,
769
+ if_f0_3,
770
+ spk_id5,
771
+ save_epoch10,
772
+ total_epoch11,
773
+ batch_size12,
774
+ if_save_latest13,
775
+ pretrained_G14,
776
+ pretrained_D15,
777
+ gpus16,
778
+ if_cache_gpu17,
779
+ if_save_every_weights18,
780
+ version19,
781
+ ):
782
+ # 生成filelist
783
+ exp_dir = "%s/logs/%s" % (now_dir, exp_dir1)
784
+ os.makedirs(exp_dir, exist_ok=True)
785
+ gt_wavs_dir = "%s/0_gt_wavs" % (exp_dir)
786
+ feature_dir = (
787
+ "%s/3_feature256" % (exp_dir)
788
+ if version19 == "v1"
789
+ else "%s/3_feature768" % (exp_dir)
790
+ )
791
+ if if_f0_3:
792
+ f0_dir = "%s/2a_f0" % (exp_dir)
793
+ f0nsf_dir = "%s/2b-f0nsf" % (exp_dir)
794
+ names = (
795
+ set([name.split(".")[0] for name in os.listdir(gt_wavs_dir)])
796
+ & set([name.split(".")[0] for name in os.listdir(feature_dir)])
797
+ & set([name.split(".")[0] for name in os.listdir(f0_dir)])
798
+ & set([name.split(".")[0] for name in os.listdir(f0nsf_dir)])
799
+ )
800
+ else:
801
+ names = set([name.split(".")[0] for name in os.listdir(gt_wavs_dir)]) & set(
802
+ [name.split(".")[0] for name in os.listdir(feature_dir)]
803
+ )
804
+ opt = []
805
+ for name in names:
806
+ if if_f0_3:
807
+ opt.append(
808
+ "%s/%s.wav|%s/%s.npy|%s/%s.wav.npy|%s/%s.wav.npy|%s"
809
+ % (
810
+ gt_wavs_dir.replace("\\", "\\\\"),
811
+ name,
812
+ feature_dir.replace("\\", "\\\\"),
813
+ name,
814
+ f0_dir.replace("\\", "\\\\"),
815
+ name,
816
+ f0nsf_dir.replace("\\", "\\\\"),
817
+ name,
818
+ spk_id5,
819
+ )
820
+ )
821
+ else:
822
+ opt.append(
823
+ "%s/%s.wav|%s/%s.npy|%s"
824
+ % (
825
+ gt_wavs_dir.replace("\\", "\\\\"),
826
+ name,
827
+ feature_dir.replace("\\", "\\\\"),
828
+ name,
829
+ spk_id5,
830
+ )
831
+ )
832
+ fea_dim = 256 if version19 == "v1" else 768
833
+ if if_f0_3:
834
+ for _ in range(2):
835
+ opt.append(
836
+ "%s/logs/mute/0_gt_wavs/mute%s.wav|%s/logs/mute/3_feature%s/mute.npy|%s/logs/mute/2a_f0/mute.wav.npy|%s/logs/mute/2b-f0nsf/mute.wav.npy|%s"
837
+ % (now_dir, sr2, now_dir, fea_dim, now_dir, now_dir, spk_id5)
838
+ )
839
+ else:
840
+ for _ in range(2):
841
+ opt.append(
842
+ "%s/logs/mute/0_gt_wavs/mute%s.wav|%s/logs/mute/3_feature%s/mute.npy|%s"
843
+ % (now_dir, sr2, now_dir, fea_dim, spk_id5)
844
+ )
845
+ shuffle(opt)
846
+ with open("%s/filelist.txt" % exp_dir, "w") as f:
847
+ f.write("\n".join(opt))
848
+ print("write filelist done")
849
+ # 生成config#无需生成config
850
+ # cmd = python_cmd + " train_nsf_sim_cache_sid_load_pretrain.py -e mi-test -sr 40k -f0 1 -bs 4 -g 0 -te 10 -se 5 -pg pretrained/f0G40k.pth -pd pretrained/f0D40k.pth -l 1 -c 0"
851
+ print("use gpus:", gpus16)
852
+ if pretrained_G14 == "":
853
+ print("no pretrained Generator")
854
+ if pretrained_D15 == "":
855
+ print("no pretrained Discriminator")
856
+ if gpus16:
857
+ cmd = (
858
+ config.python_cmd
859
+ + " train_nsf_sim_cache_sid_load_pretrain.py -e %s -sr %s -f0 %s -bs %s -g %s -te %s -se %s %s %s -l %s -c %s -sw %s -v %s"
860
+ % (
861
+ exp_dir1,
862
+ sr2,
863
+ 1 if if_f0_3 else 0,
864
+ batch_size12,
865
+ gpus16,
866
+ total_epoch11,
867
+ save_epoch10,
868
+ "-pg %s" % pretrained_G14 if pretrained_G14 != "" else "",
869
+ "-pd %s" % pretrained_D15 if pretrained_D15 != "" else "",
870
+ 1 if if_save_latest13 == i18n("是") else 0,
871
+ 1 if if_cache_gpu17 == i18n("是") else 0,
872
+ 1 if if_save_every_weights18 == i18n("是") else 0,
873
+ version19,
874
+ )
875
+ )
876
+ else:
877
+ cmd = (
878
+ config.python_cmd
879
+ + " train_nsf_sim_cache_sid_load_pretrain.py -e %s -sr %s -f0 %s -bs %s -te %s -se %s %s %s -l %s -c %s -sw %s -v %s"
880
+ % (
881
+ exp_dir1,
882
+ sr2,
883
+ 1 if if_f0_3 else 0,
884
+ batch_size12,
885
+ total_epoch11,
886
+ save_epoch10,
887
+ "-pg %s" % pretrained_G14 if pretrained_G14 != "" else "\b",
888
+ "-pd %s" % pretrained_D15 if pretrained_D15 != "" else "\b",
889
+ 1 if if_save_latest13 == i18n("是") else 0,
890
+ 1 if if_cache_gpu17 == i18n("是") else 0,
891
+ 1 if if_save_every_weights18 == i18n("是") else 0,
892
+ version19,
893
+ )
894
+ )
895
+ print(cmd)
896
+ p = Popen(cmd, shell=True, cwd=now_dir)
897
+ p.wait()
898
+ return "训练结束, 您可查看控制台训练日志或实验文件夹下的train.log"
899
+
900
+
901
+ # but4.click(train_index, [exp_dir1], info3)
902
+ def train_index(exp_dir1, version19):
903
+ exp_dir = "%s/logs/%s" % (now_dir, exp_dir1)
904
+ os.makedirs(exp_dir, exist_ok=True)
905
+ feature_dir = (
906
+ "%s/3_feature256" % (exp_dir)
907
+ if version19 == "v1"
908
+ else "%s/3_feature768" % (exp_dir)
909
+ )
910
+ if not os.path.exists(feature_dir):
911
+ return "请先进行特征提取!"
912
+ listdir_res = list(os.listdir(feature_dir))
913
+ if len(listdir_res) == 0:
914
+ return "请先进行特征提取!"
915
+ infos = []
916
+ npys = []
917
+ for name in sorted(listdir_res):
918
+ phone = np.load("%s/%s" % (feature_dir, name))
919
+ npys.append(phone)
920
+ big_npy = np.concatenate(npys, 0)
921
+ big_npy_idx = np.arange(big_npy.shape[0])
922
+ np.random.shuffle(big_npy_idx)
923
+ big_npy = big_npy[big_npy_idx]
924
+ if big_npy.shape[0] > 2e5:
925
+ # if(1):
926
+ infos.append("Trying doing kmeans %s shape to 10k centers." % big_npy.shape[0])
927
+ yield "\n".join(infos)
928
+ try:
929
+ big_npy = (
930
+ MiniBatchKMeans(
931
+ n_clusters=10000,
932
+ verbose=True,
933
+ batch_size=256 * config.n_cpu,
934
+ compute_labels=False,
935
+ init="random",
936
+ )
937
+ .fit(big_npy)
938
+ .cluster_centers_
939
+ )
940
+ except:
941
+ info = traceback.format_exc()
942
+ print(info)
943
+ infos.append(info)
944
+ yield "\n".join(infos)
945
+
946
+ np.save("%s/total_fea.npy" % exp_dir, big_npy)
947
+ n_ivf = min(int(16 * np.sqrt(big_npy.shape[0])), big_npy.shape[0] // 39)
948
+ infos.append("%s,%s" % (big_npy.shape, n_ivf))
949
+ yield "\n".join(infos)
950
+ index = faiss.index_factory(256 if version19 == "v1" else 768, "IVF%s,Flat" % n_ivf)
951
+ # index = faiss.index_factory(256if version19=="v1"else 768, "IVF%s,PQ128x4fs,RFlat"%n_ivf)
952
+ infos.append("training")
953
+ yield "\n".join(infos)
954
+ index_ivf = faiss.extract_index_ivf(index) #
955
+ index_ivf.nprobe = 1
956
+ index.train(big_npy)
957
+ faiss.write_index(
958
+ index,
959
+ "%s/trained_IVF%s_Flat_nprobe_%s_%s_%s.index"
960
+ % (exp_dir, n_ivf, index_ivf.nprobe, exp_dir1, version19),
961
+ )
962
+ # faiss.write_index(index, '%s/trained_IVF%s_Flat_FastScan_%s.index'%(exp_dir,n_ivf,version19))
963
+ infos.append("adding")
964
+ yield "\n".join(infos)
965
+ batch_size_add = 8192
966
+ for i in range(0, big_npy.shape[0], batch_size_add):
967
+ index.add(big_npy[i : i + batch_size_add])
968
+ faiss.write_index(
969
+ index,
970
+ "%s/added_IVF%s_Flat_nprobe_%s_%s_%s.index"
971
+ % (exp_dir, n_ivf, index_ivf.nprobe, exp_dir1, version19),
972
+ )
973
+ infos.append(
974
+ "成功构建索引,added_IVF%s_Flat_nprobe_%s_%s_%s.index"
975
+ % (n_ivf, index_ivf.nprobe, exp_dir1, version19)
976
+ )
977
+ # faiss.write_index(index, '%s/added_IVF%s_Flat_FastScan_%s.index'%(exp_dir,n_ivf,version19))
978
+ # infos.append("成功构建索引,added_IVF%s_Flat_FastScan_%s.index"%(n_ivf,version19))
979
+ yield "\n".join(infos)
980
+
981
+
982
+ # but5.click(train1key, [exp_dir1, sr2, if_f0_3, trainset_dir4, spk_id5, gpus6, np7, f0method8, save_epoch10, total_epoch11, batch_size12, if_save_latest13, pretrained_G14, pretrained_D15, gpus16, if_cache_gpu17], info3)
983
+ def train1key(
984
+ exp_dir1,
985
+ sr2,
986
+ if_f0_3,
987
+ trainset_dir4,
988
+ spk_id5,
989
+ np7,
990
+ f0method8,
991
+ save_epoch10,
992
+ total_epoch11,
993
+ batch_size12,
994
+ if_save_latest13,
995
+ pretrained_G14,
996
+ pretrained_D15,
997
+ gpus16,
998
+ if_cache_gpu17,
999
+ if_save_every_weights18,
1000
+ version19,
1001
+ echl
1002
+ ):
1003
+ infos = []
1004
+
1005
+ def get_info_str(strr):
1006
+ infos.append(strr)
1007
+ return "\n".join(infos)
1008
+
1009
+ model_log_dir = "%s/logs/%s" % (now_dir, exp_dir1)
1010
+ preprocess_log_path = "%s/preprocess.log" % model_log_dir
1011
+ extract_f0_feature_log_path = "%s/extract_f0_feature.log" % model_log_dir
1012
+ gt_wavs_dir = "%s/0_gt_wavs" % model_log_dir
1013
+ feature_dir = (
1014
+ "%s/3_feature256" % model_log_dir
1015
+ if version19 == "v1"
1016
+ else "%s/3_feature768" % model_log_dir
1017
+ )
1018
+
1019
+ os.makedirs(model_log_dir, exist_ok=True)
1020
+ #########step1:处理数据
1021
+ open(preprocess_log_path, "w").close()
1022
+ cmd = (
1023
+ config.python_cmd
1024
+ + " trainset_preprocess_pipeline_print.py %s %s %s %s "
1025
+ % (trainset_dir4, sr_dict[sr2], np7, model_log_dir)
1026
+ + str(config.noparallel)
1027
+ )
1028
+ yield get_info_str(i18n("step1:正在处理数据"))
1029
+ yield get_info_str(cmd)
1030
+ p = Popen(cmd, shell=True)
1031
+ p.wait()
1032
+ with open(preprocess_log_path, "r") as f:
1033
+ print(f.read())
1034
+ #########step2a:提取音高
1035
+ open(extract_f0_feature_log_path, "w")
1036
+ if if_f0_3:
1037
+ yield get_info_str("step2a:正在提取音高")
1038
+ cmd = config.python_cmd + " extract_f0_print.py %s %s %s %s" % (
1039
+ model_log_dir,
1040
+ np7,
1041
+ f0method8,
1042
+ echl
1043
+ )
1044
+ yield get_info_str(cmd)
1045
+ p = Popen(cmd, shell=True, cwd=now_dir)
1046
+ p.wait()
1047
+ with open(extract_f0_feature_log_path, "r") as f:
1048
+ print(f.read())
1049
+ else:
1050
+ yield get_info_str(i18n("step2a:无需提取音高"))
1051
+ #######step2b:提取特征
1052
+ yield get_info_str(i18n("step2b:正在提取特征"))
1053
+ gpus = gpus16.split("-")
1054
+ leng = len(gpus)
1055
+ ps = []
1056
+ for idx, n_g in enumerate(gpus):
1057
+ cmd = config.python_cmd + " extract_feature_print.py %s %s %s %s %s %s" % (
1058
+ config.device,
1059
+ leng,
1060
+ idx,
1061
+ n_g,
1062
+ model_log_dir,
1063
+ version19,
1064
+ )
1065
+ yield get_info_str(cmd)
1066
+ p = Popen(
1067
+ cmd, shell=True, cwd=now_dir
1068
+ ) # , shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE, cwd=now_dir
1069
+ ps.append(p)
1070
+ for p in ps:
1071
+ p.wait()
1072
+ with open(extract_f0_feature_log_path, "r") as f:
1073
+ print(f.read())
1074
+ #######step3a:训练模型
1075
+ yield get_info_str(i18n("step3a:正在训练模型"))
1076
+ # 生成filelist
1077
+ if if_f0_3:
1078
+ f0_dir = "%s/2a_f0" % model_log_dir
1079
+ f0nsf_dir = "%s/2b-f0nsf" % model_log_dir
1080
+ names = (
1081
+ set([name.split(".")[0] for name in os.listdir(gt_wavs_dir)])
1082
+ & set([name.split(".")[0] for name in os.listdir(feature_dir)])
1083
+ & set([name.split(".")[0] for name in os.listdir(f0_dir)])
1084
+ & set([name.split(".")[0] for name in os.listdir(f0nsf_dir)])
1085
+ )
1086
+ else:
1087
+ names = set([name.split(".")[0] for name in os.listdir(gt_wavs_dir)]) & set(
1088
+ [name.split(".")[0] for name in os.listdir(feature_dir)]
1089
+ )
1090
+ opt = []
1091
+ for name in names:
1092
+ if if_f0_3:
1093
+ opt.append(
1094
+ "%s/%s.wav|%s/%s.npy|%s/%s.wav.npy|%s/%s.wav.npy|%s"
1095
+ % (
1096
+ gt_wavs_dir.replace("\\", "\\\\"),
1097
+ name,
1098
+ feature_dir.replace("\\", "\\\\"),
1099
+ name,
1100
+ f0_dir.replace("\\", "\\\\"),
1101
+ name,
1102
+ f0nsf_dir.replace("\\", "\\\\"),
1103
+ name,
1104
+ spk_id5,
1105
+ )
1106
+ )
1107
+ else:
1108
+ opt.append(
1109
+ "%s/%s.wav|%s/%s.npy|%s"
1110
+ % (
1111
+ gt_wavs_dir.replace("\\", "\\\\"),
1112
+ name,
1113
+ feature_dir.replace("\\", "\\\\"),
1114
+ name,
1115
+ spk_id5,
1116
+ )
1117
+ )
1118
+ fea_dim = 256 if version19 == "v1" else 768
1119
+ if if_f0_3:
1120
+ for _ in range(2):
1121
+ opt.append(
1122
+ "%s/logs/mute/0_gt_wavs/mute%s.wav|%s/logs/mute/3_feature%s/mute.npy|%s/logs/mute/2a_f0/mute.wav.npy|%s/logs/mute/2b-f0nsf/mute.wav.npy|%s"
1123
+ % (now_dir, sr2, now_dir, fea_dim, now_dir, now_dir, spk_id5)
1124
+ )
1125
+ else:
1126
+ for _ in range(2):
1127
+ opt.append(
1128
+ "%s/logs/mute/0_gt_wavs/mute%s.wav|%s/logs/mute/3_feature%s/mute.npy|%s"
1129
+ % (now_dir, sr2, now_dir, fea_dim, spk_id5)
1130
+ )
1131
+ shuffle(opt)
1132
+ with open("%s/filelist.txt" % model_log_dir, "w") as f:
1133
+ f.write("\n".join(opt))
1134
+ yield get_info_str("write filelist done")
1135
+ if gpus16:
1136
+ cmd = (
1137
+ config.python_cmd
1138
+ + " train_nsf_sim_cache_sid_load_pretrain.py -e %s -sr %s -f0 %s -bs %s -g %s -te %s -se %s %s %s -l %s -c %s -sw %s -v %s"
1139
+ % (
1140
+ exp_dir1,
1141
+ sr2,
1142
+ 1 if if_f0_3 else 0,
1143
+ batch_size12,
1144
+ gpus16,
1145
+ total_epoch11,
1146
+ save_epoch10,
1147
+ "-pg %s" % pretrained_G14 if pretrained_G14 != "" else "",
1148
+ "-pd %s" % pretrained_D15 if pretrained_D15 != "" else "",
1149
+ 1 if if_save_latest13 == i18n("是") else 0,
1150
+ 1 if if_cache_gpu17 == i18n("是") else 0,
1151
+ 1 if if_save_every_weights18 == i18n("是") else 0,
1152
+ version19,
1153
+ )
1154
+ )
1155
+ else:
1156
+ cmd = (
1157
+ config.python_cmd
1158
+ + " train_nsf_sim_cache_sid_load_pretrain.py -e %s -sr %s -f0 %s -bs %s -te %s -se %s %s %s -l %s -c %s -sw %s -v %s"
1159
+ % (
1160
+ exp_dir1,
1161
+ sr2,
1162
+ 1 if if_f0_3 else 0,
1163
+ batch_size12,
1164
+ total_epoch11,
1165
+ save_epoch10,
1166
+ "-pg %s" % pretrained_G14 if pretrained_G14 != "" else "",
1167
+ "-pd %s" % pretrained_D15 if pretrained_D15 != "" else "",
1168
+ 1 if if_save_latest13 == i18n("是") else 0,
1169
+ 1 if if_cache_gpu17 == i18n("是") else 0,
1170
+ 1 if if_save_every_weights18 == i18n("是") else 0,
1171
+ version19,
1172
+ )
1173
+ )
1174
+ yield get_info_str(cmd)
1175
+ p = Popen(cmd, shell=True, cwd=now_dir)
1176
+ p.wait()
1177
+ yield get_info_str(i18n("训练结束, 您可查看控制台训练日志或实验文件夹下的train.log"))
1178
+ #######step3b:训练索引
1179
+ npys = []
1180
+ listdir_res = list(os.listdir(feature_dir))
1181
+ for name in sorted(listdir_res):
1182
+ phone = np.load("%s/%s" % (feature_dir, name))
1183
+ npys.append(phone)
1184
+ big_npy = np.concatenate(npys, 0)
1185
+
1186
+ big_npy_idx = np.arange(big_npy.shape[0])
1187
+ np.random.shuffle(big_npy_idx)
1188
+ big_npy = big_npy[big_npy_idx]
1189
+
1190
+ if big_npy.shape[0] > 2e5:
1191
+ # if(1):
1192
+ info = "Trying doing kmeans %s shape to 10k centers." % big_npy.shape[0]
1193
+ print(info)
1194
+ yield get_info_str(info)
1195
+ try:
1196
+ big_npy = (
1197
+ MiniBatchKMeans(
1198
+ n_clusters=10000,
1199
+ verbose=True,
1200
+ batch_size=256 * config.n_cpu,
1201
+ compute_labels=False,
1202
+ init="random",
1203
+ )
1204
+ .fit(big_npy)
1205
+ .cluster_centers_
1206
+ )
1207
+ except:
1208
+ info = traceback.format_exc()
1209
+ print(info)
1210
+ yield get_info_str(info)
1211
+
1212
+ np.save("%s/total_fea.npy" % model_log_dir, big_npy)
1213
+
1214
+ # n_ivf = big_npy.shape[0] // 39
1215
+ n_ivf = min(int(16 * np.sqrt(big_npy.shape[0])), big_npy.shape[0] // 39)
1216
+ yield get_info_str("%s,%s" % (big_npy.shape, n_ivf))
1217
+ index = faiss.index_factory(256 if version19 == "v1" else 768, "IVF%s,Flat" % n_ivf)
1218
+ yield get_info_str("training index")
1219
+ index_ivf = faiss.extract_index_ivf(index) #
1220
+ index_ivf.nprobe = 1
1221
+ index.train(big_npy)
1222
+ faiss.write_index(
1223
+ index,
1224
+ "%s/trained_IVF%s_Flat_nprobe_%s_%s_%s.index"
1225
+ % (model_log_dir, n_ivf, index_ivf.nprobe, exp_dir1, version19),
1226
+ )
1227
+ yield get_info_str("adding index")
1228
+ batch_size_add = 8192
1229
+ for i in range(0, big_npy.shape[0], batch_size_add):
1230
+ index.add(big_npy[i : i + batch_size_add])
1231
+ faiss.write_index(
1232
+ index,
1233
+ "%s/added_IVF%s_Flat_nprobe_%s_%s_%s.index"
1234
+ % (model_log_dir, n_ivf, index_ivf.nprobe, exp_dir1, version19),
1235
+ )
1236
+ yield get_info_str(
1237
+ "成功构建索引, added_IVF%s_Flat_nprobe_%s_%s_%s.index"
1238
+ % (n_ivf, index_ivf.nprobe, exp_dir1, version19)
1239
+ )
1240
+ yield get_info_str(i18n("全流程结束!"))
1241
+
1242
+
1243
+ # ckpt_path2.change(change_info_,[ckpt_path2],[sr__,if_f0__])
1244
+ def change_info_(ckpt_path):
1245
+ if not os.path.exists(ckpt_path.replace(os.path.basename(ckpt_path), "train.log")):
1246
+ return {"__type__": "update"}, {"__type__": "update"}, {"__type__": "update"}
1247
+ try:
1248
+ with open(
1249
+ ckpt_path.replace(os.path.basename(ckpt_path), "train.log"), "r"
1250
+ ) as f:
1251
+ info = eval(f.read().strip("\n").split("\n")[0].split("\t")[-1])
1252
+ sr, f0 = info["sample_rate"], info["if_f0"]
1253
+ version = "v2" if ("version" in info and info["version"] == "v2") else "v1"
1254
+ return sr, str(f0), version
1255
+ except:
1256
+ traceback.print_exc()
1257
+ return {"__type__": "update"}, {"__type__": "update"}, {"__type__": "update"}
1258
+
1259
+
1260
+ def export_onnx(ModelPath, ExportedPath):
1261
+ cpt = torch.load(ModelPath, map_location="cpu")
1262
+ cpt["config"][-3] = cpt["weight"]["emb_g.weight"].shape[0]
1263
+ vec_channels = 256 if cpt.get("version", "v1") == "v1" else 768
1264
+
1265
+ test_phone = torch.rand(1, 200, vec_channels) # hidden unit
1266
+ test_phone_lengths = torch.tensor([200]).long() # hidden unit 长度(貌似没啥用)
1267
+ test_pitch = torch.randint(size=(1, 200), low=5, high=255) # 基频(单位赫兹)
1268
+ test_pitchf = torch.rand(1, 200) # nsf基频
1269
+ test_ds = torch.LongTensor([0]) # 说话人ID
1270
+ test_rnd = torch.rand(1, 192, 200) # 噪声(加入随机因子)
1271
+
1272
+ device = "cpu" # 导出时设备(不影响使用模型)
1273
+
1274
+
1275
+ net_g = SynthesizerTrnMsNSFsidM(
1276
+ *cpt["config"], is_half=False, version=cpt.get("version", "v1")
1277
+ ) # fp32导出(C++要支持fp16必须手动将内存重新排列所以暂时不用fp16)
1278
+ net_g.load_state_dict(cpt["weight"], strict=False)
1279
+ input_names = ["phone", "phone_lengths", "pitch", "pitchf", "ds", "rnd"]
1280
+ output_names = [
1281
+ "audio",
1282
+ ]
1283
+ # net_g.construct_spkmixmap(n_speaker) 多角色混合轨道导出
1284
+ torch.onnx.export(
1285
+ net_g,
1286
+ (
1287
+ test_phone.to(device),
1288
+ test_phone_lengths.to(device),
1289
+ test_pitch.to(device),
1290
+ test_pitchf.to(device),
1291
+ test_ds.to(device),
1292
+ test_rnd.to(device),
1293
+ ),
1294
+ ExportedPath,
1295
+ dynamic_axes={
1296
+ "phone": [1],
1297
+ "pitch": [1],
1298
+ "pitchf": [1],
1299
+ "rnd": [2],
1300
+ },
1301
+ do_constant_folding=False,
1302
+ opset_version=13,
1303
+ verbose=False,
1304
+ input_names=input_names,
1305
+ output_names=output_names,
1306
+ )
1307
+ return "Finished"
1308
+
1309
+
1310
+ #region Mangio-RVC-Fork CLI App
1311
+ import re as regex
1312
+ import scipy.io.wavfile as wavfile
1313
+
1314
+ cli_current_page = "HOME"
1315
+
1316
+ def cli_split_command(com):
1317
+ exp = r'(?:(?<=\s)|^)"(.*?)"(?=\s|$)|(\S+)'
1318
+ split_array = regex.findall(exp, com)
1319
+ split_array = [group[0] if group[0] else group[1] for group in split_array]
1320
+ return split_array
1321
+
1322
+ def execute_generator_function(genObject):
1323
+ for _ in genObject: pass
1324
+
1325
+ def cli_infer(com):
1326
+ # get VC first
1327
+ com = cli_split_command(com)
1328
+ model_name = com[0]
1329
+ source_audio_path = com[1]
1330
+ output_file_name = com[2]
1331
+ feature_index_path = com[3]
1332
+ f0_file = None # Not Implemented Yet
1333
+
1334
+ # Get parameters for inference
1335
+ speaker_id = int(com[4])
1336
+ transposition = float(com[5])
1337
+ f0_method = com[6]
1338
+ crepe_hop_length = int(com[7])
1339
+ harvest_median_filter = int(com[8])
1340
+ resample = int(com[9])
1341
+ mix = float(com[10])
1342
+ feature_ratio = float(com[11])
1343
+ protection_amnt = float(com[12])
1344
+
1345
+ print("Mangio-RVC-Fork Infer-CLI: Starting the inference...")
1346
+ vc_data = get_vc(model_name)
1347
+ print(vc_data)
1348
+ print("Mangio-RVC-Fork Infer-CLI: Performing inference...")
1349
+ conversion_data = vc_single(
1350
+ speaker_id,
1351
+ source_audio_path,
1352
+ transposition,
1353
+ f0_file,
1354
+ f0_method,
1355
+ feature_index_path,
1356
+ feature_index_path,
1357
+ feature_ratio,
1358
+ harvest_median_filter,
1359
+ resample,
1360
+ mix,
1361
+ protection_amnt,
1362
+ crepe_hop_length,
1363
+ )
1364
+ if "Success." in conversion_data[0]:
1365
+ print("Mangio-RVC-Fork Infer-CLI: Inference succeeded. Writing to %s/%s..." % ('audio-outputs', output_file_name))
1366
+ wavfile.write('%s/%s' % ('audio-outputs', output_file_name), conversion_data[1][0], conversion_data[1][1])
1367
+ print("Mangio-RVC-Fork Infer-CLI: Finished! Saved output to %s/%s" % ('audio-outputs', output_file_name))
1368
+ else:
1369
+ print("Mangio-RVC-Fork Infer-CLI: Inference failed. Here's the traceback: ")
1370
+ print(conversion_data[0])
1371
+
1372
+ def cli_pre_process(com):
1373
+ com = cli_split_command(com)
1374
+ model_name = com[0]
1375
+ trainset_directory = com[1]
1376
+ sample_rate = com[2]
1377
+ num_processes = int(com[3])
1378
+
1379
+ print("Mangio-RVC-Fork Pre-process: Starting...")
1380
+ generator = preprocess_dataset(
1381
+ trainset_directory,
1382
+ model_name,
1383
+ sample_rate,
1384
+ num_processes
1385
+ )
1386
+ execute_generator_function(generator)
1387
+ print("Mangio-RVC-Fork Pre-process: Finished")
1388
+
1389
+ def cli_extract_feature(com):
1390
+ com = cli_split_command(com)
1391
+ model_name = com[0]
1392
+ gpus = com[1]
1393
+ num_processes = int(com[2])
1394
+ has_pitch_guidance = True if (int(com[3]) == 1) else False
1395
+ f0_method = com[4]
1396
+ crepe_hop_length = int(com[5])
1397
+ version = com[6] # v1 or v2
1398
+
1399
+ print("Mangio-RVC-CLI: Extract Feature Has Pitch: " + str(has_pitch_guidance))
1400
+ print("Mangio-RVC-CLI: Extract Feature Version: " + str(version))
1401
+ print("Mangio-RVC-Fork Feature Extraction: Starting...")
1402
+ generator = extract_f0_feature(
1403
+ gpus,
1404
+ num_processes,
1405
+ f0_method,
1406
+ has_pitch_guidance,
1407
+ model_name,
1408
+ version,
1409
+ crepe_hop_length
1410
+ )
1411
+ execute_generator_function(generator)
1412
+ print("Mangio-RVC-Fork Feature Extraction: Finished")
1413
+
1414
+ def cli_train(com):
1415
+ com = cli_split_command(com)
1416
+ model_name = com[0]
1417
+ sample_rate = com[1]
1418
+ has_pitch_guidance = True if (int(com[2]) == 1) else False
1419
+ speaker_id = int(com[3])
1420
+ save_epoch_iteration = int(com[4])
1421
+ total_epoch = int(com[5]) # 10000
1422
+ batch_size = int(com[6])
1423
+ gpu_card_slot_numbers = com[7]
1424
+ if_save_latest = i18n("是") if (int(com[8]) == 1) else i18n("否")
1425
+ if_cache_gpu = i18n("是") if (int(com[9]) == 1) else i18n("否")
1426
+ if_save_every_weight = i18n("是") if (int(com[10]) == 1) else i18n("否")
1427
+ version = com[11]
1428
+
1429
+ pretrained_base = "pretrained/" if version == "v1" else "pretrained_v2/"
1430
+
1431
+ g_pretrained_path = "%sf0G%s.pth" % (pretrained_base, sample_rate)
1432
+ d_pretrained_path = "%sf0D%s.pth" % (pretrained_base, sample_rate)
1433
+
1434
+ print("Mangio-RVC-Fork Train-CLI: Training...")
1435
+ click_train(
1436
+ model_name,
1437
+ sample_rate,
1438
+ has_pitch_guidance,
1439
+ speaker_id,
1440
+ save_epoch_iteration,
1441
+ total_epoch,
1442
+ batch_size,
1443
+ if_save_latest,
1444
+ g_pretrained_path,
1445
+ d_pretrained_path,
1446
+ gpu_card_slot_numbers,
1447
+ if_cache_gpu,
1448
+ if_save_every_weight,
1449
+ version
1450
+ )
1451
+
1452
+ def cli_train_feature(com):
1453
+ com = cli_split_command(com)
1454
+ model_name = com[0]
1455
+ version = com[1]
1456
+ print("Mangio-RVC-Fork Train Feature Index-CLI: Training... Please wait")
1457
+ generator = train_index(
1458
+ model_name,
1459
+ version
1460
+ )
1461
+ execute_generator_function(generator)
1462
+ print("Mangio-RVC-Fork Train Feature Index-CLI: Done!")
1463
+
1464
+ def cli_extract_model(com):
1465
+ com = cli_split_command(com)
1466
+ model_path = com[0]
1467
+ save_name = com[1]
1468
+ sample_rate = com[2]
1469
+ has_pitch_guidance = com[3]
1470
+ info = com[4]
1471
+ version = com[5]
1472
+ extract_small_model_process = extract_small_model(
1473
+ model_path,
1474
+ save_name,
1475
+ sample_rate,
1476
+ has_pitch_guidance,
1477
+ info,
1478
+ version
1479
+ )
1480
+ if extract_small_model_process == "Success.":
1481
+ print("Mangio-RVC-Fork Extract Small Model: Success!")
1482
+ else:
1483
+ print(str(extract_small_model_process))
1484
+ print("Mangio-RVC-Fork Extract Small Model: Failed!")
1485
+
1486
+ def print_page_details():
1487
+ if cli_current_page == "HOME":
1488
+ print(" go home : Takes you back to home with a navigation list.")
1489
+ print(" go infer : Takes you to inference command execution.\n")
1490
+ print(" go pre-process : Takes you to training step.1) pre-process command execution.")
1491
+ print(" go extract-feature : Takes you to training step.2) extract-feature command execution.")
1492
+ print(" go train : Takes you to training step.3) being or continue training command execution.")
1493
+ print(" go train-feature : Takes you to the train feature index command execution.\n")
1494
+ print(" go extract-model : Takes you to the extract small model command execution.")
1495
+ elif cli_current_page == "INFER":
1496
+ print(" arg 1) model name with .pth in ./weights: mi-test.pth")
1497
+ print(" arg 2) source audio path: myFolder\\MySource.wav")
1498
+ print(" arg 3) output file name to be placed in './audio-outputs': MyTest.wav")
1499
+ print(" arg 4) feature index file path: logs/mi-test/added_IVF3042_Flat_nprobe_1.index")
1500
+ print(" arg 5) speaker id: 0")
1501
+ print(" arg 6) transposition: 0")
1502
+ print(" arg 7) f0 method: harvest (pm, harvest, crepe, crepe-tiny, hybrid[x,x,x,x], mangio-crepe, mangio-crepe-tiny)")
1503
+ print(" arg 8) crepe hop length: 160")
1504
+ print(" arg 9) harvest median filter radius: 3 (0-7)")
1505
+ print(" arg 10) post resample rate: 0")
1506
+ print(" arg 11) mix volume envelope: 1")
1507
+ print(" arg 12) feature index ratio: 0.78 (0-1)")
1508
+ print(" arg 13) Voiceless Consonant Protection (Less Artifact): 0.33 (Smaller number = more protection. 0.50 means Dont Use.) \n")
1509
+ print("Example: mi-test.pth saudio/Sidney.wav myTest.wav logs/mi-test/added_index.index 0 -2 harvest 160 3 0 1 0.95 0.33")
1510
+ elif cli_current_page == "PRE-PROCESS":
1511
+ print(" arg 1) Model folder name in ./logs: mi-test")
1512
+ print(" arg 2) Trainset directory: mydataset (or) E:\\my-data-set")
1513
+ print(" arg 3) Sample rate: 40k (32k, 40k, 48k)")
1514
+ print(" arg 4) Number of CPU threads to use: 8 \n")
1515
+ print("Example: mi-test mydataset 40k 24")
1516
+ elif cli_current_page == "EXTRACT-FEATURE":
1517
+ print(" arg 1) Model folder name in ./logs: mi-test")
1518
+ print(" arg 2) Gpu card slot: 0 (0-1-2 if using 3 GPUs)")
1519
+ print(" arg 3) Number of CPU threads to use: 8")
1520
+ print(" arg 4) Has Pitch Guidance?: 1 (0 for no, 1 for yes)")
1521
+ print(" arg 5) f0 Method: harvest (pm, harvest, dio, crepe)")
1522
+ print(" arg 6) Crepe hop length: 128")
1523
+ print(" arg 7) Version for pre-trained models: v2 (use either v1 or v2)\n")
1524
+ print("Example: mi-test 0 24 1 harvest 128 v2")
1525
+ elif cli_current_page == "TRAIN":
1526
+ print(" arg 1) Model folder name in ./logs: mi-test")
1527
+ print(" arg 2) Sample rate: 40k (32k, 40k, 48k)")
1528
+ print(" arg 3) Has Pitch Guidance?: 1 (0 for no, 1 for yes)")
1529
+ print(" arg 4) speaker id: 0")
1530
+ print(" arg 5) Save epoch iteration: 50")
1531
+ print(" arg 6) Total epochs: 10000")
1532
+ print(" arg 7) Batch size: 8")
1533
+ print(" arg 8) Gpu card slot: 0 (0-1-2 if using 3 GPUs)")
1534
+ print(" arg 9) Save only the latest checkpoint: 0 (0 for no, 1 for yes)")
1535
+ print(" arg 10) Whether to cache training set to vram: 0 (0 for no, 1 for yes)")
1536
+ print(" arg 11) Save extracted small model every generation?: 0 (0 for no, 1 for yes)")
1537
+ print(" arg 12) Model architecture version: v2 (use either v1 or v2)\n")
1538
+ print("Example: mi-test 40k 1 0 50 10000 8 0 0 0 0 v2")
1539
+ elif cli_current_page == "TRAIN-FEATURE":
1540
+ print(" arg 1) Model folder name in ./logs: mi-test")
1541
+ print(" arg 2) Model architecture version: v2 (use either v1 or v2)\n")
1542
+ print("Example: mi-test v2")
1543
+ elif cli_current_page == "EXTRACT-MODEL":
1544
+ print(" arg 1) Model Path: logs/mi-test/G_168000.pth")
1545
+ print(" arg 2) Model save name: MyModel")
1546
+ print(" arg 3) Sample rate: 40k (32k, 40k, 48k)")
1547
+ print(" arg 4) Has Pitch Guidance?: 1 (0 for no, 1 for yes)")
1548
+ print(' arg 5) Model information: "My Model"')
1549
+ print(" arg 6) Model architecture version: v2 (use either v1 or v2)\n")
1550
+ print('Example: logs/mi-test/G_168000.pth MyModel 40k 1 "Created by Cole Mangio" v2')
1551
+ print("")
1552
+
1553
+ def change_page(page):
1554
+ global cli_current_page
1555
+ cli_current_page = page
1556
+ return 0
1557
+
1558
+ def execute_command(com):
1559
+ if com == "go home":
1560
+ return change_page("HOME")
1561
+ elif com == "go infer":
1562
+ return change_page("INFER")
1563
+ elif com == "go pre-process":
1564
+ return change_page("PRE-PROCESS")
1565
+ elif com == "go extract-feature":
1566
+ return change_page("EXTRACT-FEATURE")
1567
+ elif com == "go train":
1568
+ return change_page("TRAIN")
1569
+ elif com == "go train-feature":
1570
+ return change_page("TRAIN-FEATURE")
1571
+ elif com == "go extract-model":
1572
+ return change_page("EXTRACT-MODEL")
1573
+ else:
1574
+ if com[:3] == "go ":
1575
+ print("page '%s' does not exist!" % com[3:])
1576
+ return 0
1577
+
1578
+ if cli_current_page == "INFER":
1579
+ cli_infer(com)
1580
+ elif cli_current_page == "PRE-PROCESS":
1581
+ cli_pre_process(com)
1582
+ elif cli_current_page == "EXTRACT-FEATURE":
1583
+ cli_extract_feature(com)
1584
+ elif cli_current_page == "TRAIN":
1585
+ cli_train(com)
1586
+ elif cli_current_page == "TRAIN-FEATURE":
1587
+ cli_train_feature(com)
1588
+ elif cli_current_page == "EXTRACT-MODEL":
1589
+ cli_extract_model(com)
1590
+
1591
+ def cli_navigation_loop():
1592
+ while True:
1593
+ print("You are currently in '%s':" % cli_current_page)
1594
+ print_page_details()
1595
+ command = input("%s: " % cli_current_page)
1596
+ try:
1597
+ execute_command(command)
1598
+ except:
1599
+ print(traceback.format_exc())
1600
+
1601
+ if(config.is_cli):
1602
+ print("\n\nMangio-RVC-Fork v2 CLI App!\n")
1603
+ print("Welcome to the CLI version of RVC. Please read the documentation on https://github.com/Mangio621/Mangio-RVC-Fork (README.MD) to understand how to use this app.\n")
1604
+ cli_navigation_loop()
1605
+
1606
+ #endregion
1607
+
1608
+ #region RVC WebUI App
1609
+
1610
+ def get_presets():
1611
+ data = None
1612
+ with open('../inference-presets.json', 'r') as file:
1613
+ data = json.load(file)
1614
+ preset_names = []
1615
+ for preset in data['presets']:
1616
+ preset_names.append(preset['name'])
1617
+
1618
+ return preset_names
1619
+
1620
+ with gr.Blocks(theme=gr.themes.Soft()) as app:
1621
+ gr.HTML("<h1> The Mangio-RVC-Fork 💻 </h1>")
1622
+ gr.Markdown(
1623
+ value=i18n(
1624
+ "本软件以MIT协议开源, 作者不对软件具备任何控制力, 使用软件者、传播软件导出的声音者自负全责. <br>如不认可该条款, 则不能使用或引用软件包内任何代码和文件. 详见根目录<b>使用需遵守的协议-LICENSE.txt</b>."
1625
+ )
1626
+ )
1627
+ with gr.Tabs():
1628
+ with gr.TabItem(i18n("模型推理")):
1629
+ # Inference Preset Row
1630
+ # with gr.Row():
1631
+ # mangio_preset = gr.Dropdown(label="Inference Preset", choices=sorted(get_presets()))
1632
+ # mangio_preset_name_save = gr.Textbox(
1633
+ # label="Your preset name"
1634
+ # )
1635
+ # mangio_preset_save_btn = gr.Button('Save Preset', variant="primary")
1636
+
1637
+ # Other RVC stuff
1638
+ with gr.Row():
1639
+ sid0 = gr.Dropdown(label=i18n("推理音色"), choices=sorted(names))
1640
+ refresh_button = gr.Button(i18n("刷新音色列表和索引路径"), variant="primary")
1641
+ clean_button = gr.Button(i18n("卸载音色省显存"), variant="primary")
1642
+ spk_item = gr.Slider(
1643
+ minimum=0,
1644
+ maximum=2333,
1645
+ step=1,
1646
+ label=i18n("请选择说话人id"),
1647
+ value=0,
1648
+ visible=False,
1649
+ interactive=True,
1650
+ )
1651
+ clean_button.click(fn=clean, inputs=[], outputs=[sid0])
1652
+ with gr.Group():
1653
+ gr.Markdown(
1654
+ value=i18n("男转女推荐+12key, 女转男推荐-12key, 如果音域爆炸导致音色失真也可以自己调整到合适音域. ")
1655
+ )
1656
+ with gr.Row():
1657
+ with gr.Column():
1658
+ vc_transform0 = gr.Number(
1659
+ label=i18n("变调(整数, 半音数量, 升八度12降八度-12)"), value=0
1660
+ )
1661
+ input_audio0 = gr.Textbox(
1662
+ label=i18n("输入待处理音频文件路径(默认是正确格式示例)"),
1663
+ value="E:\\codes\\py39\\test-20230416b\\todo-songs\\冬之花clip1.wav",
1664
+ )
1665
+ f0method0 = gr.Radio(
1666
+ label=i18n(
1667
+ "选择音高提取算法,输入歌声可用pm提速,harvest低音好但巨慢无比,crepe效果好但吃GPU"
1668
+ ),
1669
+ choices=["pm", "harvest", "dio", "crepe", "crepe-tiny", "mangio-crepe", "mangio-crepe-tiny"], # Fork Feature. Add Crepe-Tiny
1670
+ value="pm",
1671
+ interactive=True,
1672
+ )
1673
+ crepe_hop_length = gr.Slider(
1674
+ minimum=1,
1675
+ maximum=512,
1676
+ step=1,
1677
+ label=i18n("crepe_hop_length"),
1678
+ value=160,
1679
+ interactive=True
1680
+ )
1681
+ filter_radius0 = gr.Slider(
1682
+ minimum=0,
1683
+ maximum=7,
1684
+ label=i18n(">=3则使用对harvest音高识别的结果使用中值滤波,数值为滤波半径,使用可以削弱哑音"),
1685
+ value=3,
1686
+ step=1,
1687
+ interactive=True,
1688
+ )
1689
+ with gr.Column():
1690
+ file_index1 = gr.Textbox(
1691
+ label=i18n("特征检索库文件路径,为空则使用下拉的选择结果"),
1692
+ value="",
1693
+ interactive=True,
1694
+ )
1695
+ file_index2 = gr.Dropdown(
1696
+ label=i18n("自动检测index路径,下拉式选择(dropdown)"),
1697
+ choices=sorted(index_paths),
1698
+ interactive=True,
1699
+ )
1700
+ refresh_button.click(
1701
+ fn=change_choices, inputs=[], outputs=[sid0, file_index2]
1702
+ )
1703
+ # file_big_npy1 = gr.Textbox(
1704
+ # label=i18n("特征文件路径"),
1705
+ # value="E:\\codes\py39\\vits_vc_gpu_train\\logs\\mi-test-1key\\total_fea.npy",
1706
+ # interactive=True,
1707
+ # )
1708
+ index_rate1 = gr.Slider(
1709
+ minimum=0,
1710
+ maximum=1,
1711
+ label=i18n("检索特征占比"),
1712
+ value=0.88,
1713
+ interactive=True,
1714
+ )
1715
+ with gr.Column():
1716
+ resample_sr0 = gr.Slider(
1717
+ minimum=0,
1718
+ maximum=48000,
1719
+ label=i18n("后处理重采样至最终采样率,0为不进行重采样"),
1720
+ value=0,
1721
+ step=1,
1722
+ interactive=True,
1723
+ )
1724
+ rms_mix_rate0 = gr.Slider(
1725
+ minimum=0,
1726
+ maximum=1,
1727
+ label=i18n("输入源音量包络替换输出音量包络融合比例,越靠近1越使用输出包络"),
1728
+ value=1,
1729
+ interactive=True,
1730
+ )
1731
+ protect0 = gr.Slider(
1732
+ minimum=0,
1733
+ maximum=0.5,
1734
+ label=i18n(
1735
+ "保护清辅音和呼吸声,防止电音撕裂等artifact,拉满0.5不开启,调低加大保护力度但可能降低索引效果"
1736
+ ),
1737
+ value=0.33,
1738
+ step=0.01,
1739
+ interactive=True,
1740
+ )
1741
+ f0_file = gr.File(label=i18n("F0曲线文件, 可选, 一行一个音高, 代替默认F0及升降调"))
1742
+ but0 = gr.Button(i18n("转换"), variant="primary")
1743
+ with gr.Row():
1744
+ vc_output1 = gr.Textbox(label=i18n("输出信息"))
1745
+ vc_output2 = gr.Audio(label=i18n("输出音频(右下角三个点,点了可以下载)"))
1746
+ but0.click(
1747
+ vc_single,
1748
+ [
1749
+ spk_item,
1750
+ input_audio0,
1751
+ vc_transform0,
1752
+ f0_file,
1753
+ f0method0,
1754
+ file_index1,
1755
+ file_index2,
1756
+ # file_big_npy1,
1757
+ index_rate1,
1758
+ filter_radius0,
1759
+ resample_sr0,
1760
+ rms_mix_rate0,
1761
+ protect0,
1762
+ crepe_hop_length
1763
+ ],
1764
+ [vc_output1, vc_output2],
1765
+ )
1766
+ with gr.Group():
1767
+ gr.Markdown(
1768
+ value=i18n("批量转换, 输入待转换音频文件夹, 或上传多个音频文件, 在指定文件夹(默认opt)下输出转换的音频. ")
1769
+ )
1770
+ with gr.Row():
1771
+ with gr.Column():
1772
+ vc_transform1 = gr.Number(
1773
+ label=i18n("变调(整数, 半音数量, 升八度12降八度-12)"), value=0
1774
+ )
1775
+ opt_input = gr.Textbox(label=i18n("指定输出文件夹"), value="opt")
1776
+ f0method1 = gr.Radio(
1777
+ label=i18n(
1778
+ "选择音高提取算法,输入歌声可用pm提速,harvest低音好但巨慢无比,crepe效果好但吃GPU"
1779
+ ),
1780
+ choices=["pm", "harvest", "crepe"],
1781
+ value="pm",
1782
+ interactive=True,
1783
+ )
1784
+ filter_radius1 = gr.Slider(
1785
+ minimum=0,
1786
+ maximum=7,
1787
+ label=i18n(">=3则使用对harvest音高识别的结果使用中值滤波,数值为滤波半径,使用可以削弱哑音"),
1788
+ value=3,
1789
+ step=1,
1790
+ interactive=True,
1791
+ )
1792
+ with gr.Column():
1793
+ file_index3 = gr.Textbox(
1794
+ label=i18n("特征检索库文件路径,为空则使用下拉的选择结果"),
1795
+ value="",
1796
+ interactive=True,
1797
+ )
1798
+ file_index4 = gr.Dropdown(
1799
+ label=i18n("自动检测index路径,下拉式选择(dropdown)"),
1800
+ choices=sorted(index_paths),
1801
+ interactive=True,
1802
+ )
1803
+ refresh_button.click(
1804
+ fn=lambda: change_choices()[1],
1805
+ inputs=[],
1806
+ outputs=file_index4,
1807
+ )
1808
+ # file_big_npy2 = gr.Textbox(
1809
+ # label=i18n("特征文件路径"),
1810
+ # value="E:\\codes\\py39\\vits_vc_gpu_train\\logs\\mi-test-1key\\total_fea.npy",
1811
+ # interactive=True,
1812
+ # )
1813
+ index_rate2 = gr.Slider(
1814
+ minimum=0,
1815
+ maximum=1,
1816
+ label=i18n("检索特征占比"),
1817
+ value=1,
1818
+ interactive=True,
1819
+ )
1820
+ with gr.Column():
1821
+ resample_sr1 = gr.Slider(
1822
+ minimum=0,
1823
+ maximum=48000,
1824
+ label=i18n("后处理重采样至最终采样率,0为不进行重采样"),
1825
+ value=0,
1826
+ step=1,
1827
+ interactive=True,
1828
+ )
1829
+ rms_mix_rate1 = gr.Slider(
1830
+ minimum=0,
1831
+ maximum=1,
1832
+ label=i18n("输入源音量包络替换输出音量包络融合比例,越靠近1越使用输出包络"),
1833
+ value=1,
1834
+ interactive=True,
1835
+ )
1836
+ protect1 = gr.Slider(
1837
+ minimum=0,
1838
+ maximum=0.5,
1839
+ label=i18n(
1840
+ "保护清辅音和呼吸声,防止电音撕裂等artifact,拉满0.5不开启,调低加大保护力度但可能降低索引效果"
1841
+ ),
1842
+ value=0.33,
1843
+ step=0.01,
1844
+ interactive=True,
1845
+ )
1846
+ with gr.Column():
1847
+ dir_input = gr.Textbox(
1848
+ label=i18n("输入待处理音频文件夹路径(去文件管理器地址栏拷就行了)"),
1849
+ value="E:\codes\py39\\test-20230416b\\todo-songs",
1850
+ )
1851
+ inputs = gr.File(
1852
+ file_count="multiple", label=i18n("也可批量输入音频文件, 二选一, 优先读文件夹")
1853
+ )
1854
+ with gr.Row():
1855
+ format1 = gr.Radio(
1856
+ label=i18n("导出文件格式"),
1857
+ choices=["wav", "flac", "mp3", "m4a"],
1858
+ value="flac",
1859
+ interactive=True,
1860
+ )
1861
+ but1 = gr.Button(i18n("转换"), variant="primary")
1862
+ vc_output3 = gr.Textbox(label=i18n("输出信息"))
1863
+ but1.click(
1864
+ vc_multi,
1865
+ [
1866
+ spk_item,
1867
+ dir_input,
1868
+ opt_input,
1869
+ inputs,
1870
+ vc_transform1,
1871
+ f0method1,
1872
+ file_index3,
1873
+ file_index4,
1874
+ # file_big_npy2,
1875
+ index_rate2,
1876
+ filter_radius1,
1877
+ resample_sr1,
1878
+ rms_mix_rate1,
1879
+ protect1,
1880
+ format1,
1881
+ crepe_hop_length,
1882
+ ],
1883
+ [vc_output3],
1884
+ )
1885
+ sid0.change(
1886
+ fn=get_vc,
1887
+ inputs=[sid0, protect0, protect1],
1888
+ outputs=[spk_item, protect0, protect1],
1889
+ )
1890
+ with gr.TabItem(i18n("伴奏人声分离&去混响&去回声")):
1891
+ with gr.Group():
1892
+ gr.Markdown(
1893
+ value=i18n(
1894
+ "人声伴奏分离批量处理, 使用UVR5模型。 <br>"
1895
+ "合格的文件夹路径格式举例��� E:\\codes\\py39\\vits_vc_gpu\\白鹭霜华测试样例(去文件管理器地址栏拷就行了)。 <br>"
1896
+ "模型分为三类: <br>"
1897
+ "1、保留人声:不带和声的音频选这个,对主人声保留比HP5更好。内置HP2和HP3两个模型,HP3可能轻微漏伴奏但对主人声保留比HP2稍微好一丁点; <br>"
1898
+ "2、仅保留主人声:带和声的音频选这个,对主人声可能有削弱。内置HP5一个模型; <br> "
1899
+ "3、去混响、去延迟模型(by FoxJoy):<br>"
1900
+ "  (1)MDX-Net(onnx_dereverb):对于双通道混响是最好的选择,不能去除单通道混响;<br>"
1901
+ "&emsp;(234)DeEcho:去除延迟效果。Aggressive比Normal去除得更彻底,DeReverb额外去除混响,可去除单声道混响,但是对高频重的板式混响去不干净。<br>"
1902
+ "去混响/去延迟,附:<br>"
1903
+ "1、DeEcho-DeReverb模型的耗时是另外2个DeEcho模型的接近2倍;<br>"
1904
+ "2、MDX-Net-Dereverb模型挺慢的;<br>"
1905
+ "3、个人推荐的最干净的配置是先MDX-Net再DeEcho-Aggressive。"
1906
+ )
1907
+ )
1908
+ with gr.Row():
1909
+ with gr.Column():
1910
+ dir_wav_input = gr.Textbox(
1911
+ label=i18n("输入待处理音频文件夹路径"),
1912
+ value="E:\\codes\\py39\\test-20230416b\\todo-songs\\todo-songs",
1913
+ )
1914
+ wav_inputs = gr.File(
1915
+ file_count="multiple", label=i18n("也可批量输入音频文件, 二选一, 优先读文件夹")
1916
+ )
1917
+ with gr.Column():
1918
+ model_choose = gr.Dropdown(label=i18n("模型"), choices=uvr5_names)
1919
+ agg = gr.Slider(
1920
+ minimum=0,
1921
+ maximum=20,
1922
+ step=1,
1923
+ label="人声提取激进程度",
1924
+ value=10,
1925
+ interactive=True,
1926
+ visible=False, # 先不开放调整
1927
+ )
1928
+ opt_vocal_root = gr.Textbox(
1929
+ label=i18n("指定输出主人声文件夹"), value="opt"
1930
+ )
1931
+ opt_ins_root = gr.Textbox(
1932
+ label=i18n("指定输出非主人声文件夹"), value="opt"
1933
+ )
1934
+ format0 = gr.Radio(
1935
+ label=i18n("导出文件格式"),
1936
+ choices=["wav", "flac", "mp3", "m4a"],
1937
+ value="flac",
1938
+ interactive=True,
1939
+ )
1940
+ but2 = gr.Button(i18n("转换"), variant="primary")
1941
+ vc_output4 = gr.Textbox(label=i18n("输出信息"))
1942
+ but2.click(
1943
+ uvr,
1944
+ [
1945
+ model_choose,
1946
+ dir_wav_input,
1947
+ opt_vocal_root,
1948
+ wav_inputs,
1949
+ opt_ins_root,
1950
+ agg,
1951
+ format0,
1952
+ ],
1953
+ [vc_output4],
1954
+ )
1955
+ with gr.TabItem(i18n("训练")):
1956
+ gr.Markdown(
1957
+ value=i18n(
1958
+ "step1: 填写实验配置. 实验数据放在logs下, 每个实验一个文件夹, 需手工输入实验名路径, 内含实验配置, 日志, 训练得到的模型文件. "
1959
+ )
1960
+ )
1961
+ with gr.Row():
1962
+ exp_dir1 = gr.Textbox(label=i18n("输入实验名"), value="mi-test")
1963
+ sr2 = gr.Radio(
1964
+ label=i18n("目标采样率"),
1965
+ choices=["40k", "48k"],
1966
+ value="40k",
1967
+ interactive=True,
1968
+ )
1969
+ if_f0_3 = gr.Radio(
1970
+ label=i18n("模型是否带音高指导(唱歌一定要, 语音可以不要)"),
1971
+ choices=[True, False],
1972
+ value=True,
1973
+ interactive=True,
1974
+ )
1975
+ version19 = gr.Radio(
1976
+ label=i18n("版本"),
1977
+ choices=["v1", "v2"],
1978
+ value="v1",
1979
+ interactive=True,
1980
+ visible=True,
1981
+ )
1982
+ np7 = gr.Slider(
1983
+ minimum=0,
1984
+ maximum=config.n_cpu,
1985
+ step=1,
1986
+ label=i18n("提取音高和处理数据使用的CPU进程数"),
1987
+ value=int(np.ceil(config.n_cpu / 1.5)),
1988
+ interactive=True,
1989
+ )
1990
+ with gr.Group(): # 暂时单人的, 后面支持最多4人的#数据处理
1991
+ gr.Markdown(
1992
+ value=i18n(
1993
+ "step2a: 自动遍历训练文件夹下所有可解码成音频的文件并进行切片归一化, 在实验目录下生成2个wav文件夹; 暂时只支持单人训练. "
1994
+ )
1995
+ )
1996
+ with gr.Row():
1997
+ trainset_dir4 = gr.Textbox(
1998
+ label=i18n("输入训练文件夹路径"), value="E:\\语音音频+标注\\米津玄师\\src"
1999
+ )
2000
+ spk_id5 = gr.Slider(
2001
+ minimum=0,
2002
+ maximum=4,
2003
+ step=1,
2004
+ label=i18n("请指定说话人id"),
2005
+ value=0,
2006
+ interactive=True,
2007
+ )
2008
+ but1 = gr.Button(i18n("处理数据"), variant="primary")
2009
+ info1 = gr.Textbox(label=i18n("输出信息"), value="")
2010
+ but1.click(
2011
+ preprocess_dataset, [trainset_dir4, exp_dir1, sr2, np7], [info1]
2012
+ )
2013
+ with gr.Group():
2014
+ gr.Markdown(value=i18n("step2b: 使用CPU提取音高(如果模型带音高), 使用GPU提取特征(选择卡号)"))
2015
+ with gr.Row():
2016
+ with gr.Column():
2017
+ gpus6 = gr.Textbox(
2018
+ label=i18n("以-分隔输入使用的卡号, 例如 0-1-2 使用卡0和卡1和卡2"),
2019
+ value=gpus,
2020
+ interactive=True,
2021
+ )
2022
+ gpu_info9 = gr.Textbox(label=i18n("显卡信息"), value=gpu_info)
2023
+ with gr.Column():
2024
+ f0method8 = gr.Radio(
2025
+ label=i18n(
2026
+ "选择音高提取算法:输入歌声可用pm提速,高质量语音但CPU差可用dio提速,harvest质量更好但慢"
2027
+ ),
2028
+ choices=["pm", "harvest", "dio", "crepe", "mangio-crepe"], # Fork feature: Crepe on f0 extraction for training.
2029
+ value="harvest",
2030
+ interactive=True,
2031
+ )
2032
+ extraction_crepe_hop_length = gr.Slider(
2033
+ minimum=1,
2034
+ maximum=512,
2035
+ step=1,
2036
+ label=i18n("crepe_hop_length"),
2037
+ value=64,
2038
+ interactive=True
2039
+ )
2040
+ but2 = gr.Button(i18n("特征提取"), variant="primary")
2041
+ info2 = gr.Textbox(label=i18n("输出信息"), value="", max_lines=8)
2042
+ but2.click(
2043
+ extract_f0_feature,
2044
+ [gpus6, np7, f0method8, if_f0_3, exp_dir1, version19, extraction_crepe_hop_length],
2045
+ [info2],
2046
+ )
2047
+ with gr.Group():
2048
+ gr.Markdown(value=i18n("step3: 填写训练设置, 开始训练模型和索引"))
2049
+ with gr.Row():
2050
+ save_epoch10 = gr.Slider(
2051
+ minimum=0,
2052
+ maximum=50,
2053
+ step=1,
2054
+ label=i18n("保存频率save_every_epoch"),
2055
+ value=5,
2056
+ interactive=True,
2057
+ )
2058
+ total_epoch11 = gr.Slider(
2059
+ minimum=0,
2060
+ maximum=10000,
2061
+ step=1,
2062
+ label=i18n("总训练轮数total_epoch"),
2063
+ value=20,
2064
+ interactive=True,
2065
+ )
2066
+ batch_size12 = gr.Slider(
2067
+ minimum=1,
2068
+ maximum=40,
2069
+ step=1,
2070
+ label=i18n("每张显卡的batch_size"),
2071
+ value=default_batch_size,
2072
+ interactive=True,
2073
+ )
2074
+ if_save_latest13 = gr.Radio(
2075
+ label=i18n("是否仅保存最新的ckpt文件以节省硬盘空间"),
2076
+ choices=[i18n("是"), i18n("否")],
2077
+ value=i18n("否"),
2078
+ interactive=True,
2079
+ )
2080
+ if_cache_gpu17 = gr.Radio(
2081
+ label=i18n(
2082
+ "是否缓存所有训练集至显存. 10min以下小数据可缓存以加速训练, 大数据缓存会炸显存也加不了多少速"
2083
+ ),
2084
+ choices=[i18n("是"), i18n("否")],
2085
+ value=i18n("否"),
2086
+ interactive=True,
2087
+ )
2088
+ if_save_every_weights18 = gr.Radio(
2089
+ label=i18n("是否在每次保存时间点将最终小模型保存至weights文件夹"),
2090
+ choices=[i18n("是"), i18n("否")],
2091
+ value=i18n("否"),
2092
+ interactive=True,
2093
+ )
2094
+ with gr.Row():
2095
+ pretrained_G14 = gr.Textbox(
2096
+ label=i18n("加载预训练底模G路径"),
2097
+ value="pretrained/f0G40k.pth",
2098
+ interactive=True,
2099
+ )
2100
+ pretrained_D15 = gr.Textbox(
2101
+ label=i18n("加载预训练底模D路径"),
2102
+ value="pretrained/f0D40k.pth",
2103
+ interactive=True,
2104
+ )
2105
+ sr2.change(
2106
+ change_sr2,
2107
+ [sr2, if_f0_3, version19],
2108
+ [pretrained_G14, pretrained_D15],
2109
+ )
2110
+ version19.change(
2111
+ change_version19,
2112
+ [sr2, if_f0_3, version19],
2113
+ [pretrained_G14, pretrained_D15, sr2],
2114
+ )
2115
+ if_f0_3.change(
2116
+ change_f0,
2117
+ [if_f0_3, sr2, version19],
2118
+ [f0method8, pretrained_G14, pretrained_D15],
2119
+ )
2120
+ gpus16 = gr.Textbox(
2121
+ label=i18n("以-分隔输入使用的卡号, 例如 0-1-2 使用卡0和卡1和卡2"),
2122
+ value=gpus,
2123
+ interactive=True,
2124
+ )
2125
+ but3 = gr.Button(i18n("训练模型"), variant="primary")
2126
+ but4 = gr.Button(i18n("训练特征索引"), variant="primary")
2127
+ but5 = gr.Button(i18n("一键训练"), variant="primary")
2128
+ info3 = gr.Textbox(label=i18n("输出信息"), value="", max_lines=10)
2129
+ but3.click(
2130
+ click_train,
2131
+ [
2132
+ exp_dir1,
2133
+ sr2,
2134
+ if_f0_3,
2135
+ spk_id5,
2136
+ save_epoch10,
2137
+ total_epoch11,
2138
+ batch_size12,
2139
+ if_save_latest13,
2140
+ pretrained_G14,
2141
+ pretrained_D15,
2142
+ gpus16,
2143
+ if_cache_gpu17,
2144
+ if_save_every_weights18,
2145
+ version19,
2146
+ ],
2147
+ info3,
2148
+ )
2149
+ but4.click(train_index, [exp_dir1, version19], info3)
2150
+ but5.click(
2151
+ train1key,
2152
+ [
2153
+ exp_dir1,
2154
+ sr2,
2155
+ if_f0_3,
2156
+ trainset_dir4,
2157
+ spk_id5,
2158
+ np7,
2159
+ f0method8,
2160
+ save_epoch10,
2161
+ total_epoch11,
2162
+ batch_size12,
2163
+ if_save_latest13,
2164
+ pretrained_G14,
2165
+ pretrained_D15,
2166
+ gpus16,
2167
+ if_cache_gpu17,
2168
+ if_save_every_weights18,
2169
+ version19,
2170
+ extraction_crepe_hop_length
2171
+ ],
2172
+ info3,
2173
+ )
2174
+
2175
+ with gr.TabItem(i18n("ckpt处理")):
2176
+ with gr.Group():
2177
+ gr.Markdown(value=i18n("模型融合, 可用于测试音色融合"))
2178
+ with gr.Row():
2179
+ ckpt_a = gr.Textbox(label=i18n("A模型路径"), value="", interactive=True)
2180
+ ckpt_b = gr.Textbox(label=i18n("B模型路径"), value="", interactive=True)
2181
+ alpha_a = gr.Slider(
2182
+ minimum=0,
2183
+ maximum=1,
2184
+ label=i18n("A模型权重"),
2185
+ value=0.5,
2186
+ interactive=True,
2187
+ )
2188
+ with gr.Row():
2189
+ sr_ = gr.Radio(
2190
+ label=i18n("目标采样率"),
2191
+ choices=["40k", "48k"],
2192
+ value="40k",
2193
+ interactive=True,
2194
+ )
2195
+ if_f0_ = gr.Radio(
2196
+ label=i18n("模型是否带音高指导"),
2197
+ choices=[i18n("是"), i18n("否")],
2198
+ value=i18n("是"),
2199
+ interactive=True,
2200
+ )
2201
+ info__ = gr.Textbox(
2202
+ label=i18n("要置入的模型信息"), value="", max_lines=8, interactive=True
2203
+ )
2204
+ name_to_save0 = gr.Textbox(
2205
+ label=i18n("保存的模型名不带后缀"),
2206
+ value="",
2207
+ max_lines=1,
2208
+ interactive=True,
2209
+ )
2210
+ version_2 = gr.Radio(
2211
+ label=i18n("模型版本型号"),
2212
+ choices=["v1", "v2"],
2213
+ value="v1",
2214
+ interactive=True,
2215
+ )
2216
+ with gr.Row():
2217
+ but6 = gr.Button(i18n("融合"), variant="primary")
2218
+ info4 = gr.Textbox(label=i18n("输出信息"), value="", max_lines=8)
2219
+ but6.click(
2220
+ merge,
2221
+ [
2222
+ ckpt_a,
2223
+ ckpt_b,
2224
+ alpha_a,
2225
+ sr_,
2226
+ if_f0_,
2227
+ info__,
2228
+ name_to_save0,
2229
+ version_2,
2230
+ ],
2231
+ info4,
2232
+ ) # def merge(path1,path2,alpha1,sr,f0,info):
2233
+ with gr.Group():
2234
+ gr.Markdown(value=i18n("修改模型信息(仅支持weights文件夹下提取的小模型文件)"))
2235
+ with gr.Row():
2236
+ ckpt_path0 = gr.Textbox(
2237
+ label=i18n("模型路径"), value="", interactive=True
2238
+ )
2239
+ info_ = gr.Textbox(
2240
+ label=i18n("要改的模型信息"), value="", max_lines=8, interactive=True
2241
+ )
2242
+ name_to_save1 = gr.Textbox(
2243
+ label=i18n("保存的文件名, 默认空为和源文件同名"),
2244
+ value="",
2245
+ max_lines=8,
2246
+ interactive=True,
2247
+ )
2248
+ with gr.Row():
2249
+ but7 = gr.Button(i18n("修改"), variant="primary")
2250
+ info5 = gr.Textbox(label=i18n("输出信息"), value="", max_lines=8)
2251
+ but7.click(change_info, [ckpt_path0, info_, name_to_save1], info5)
2252
+ with gr.Group():
2253
+ gr.Markdown(value=i18n("查看模型信息(仅支持weights文件夹下提取的小模型文件)"))
2254
+ with gr.Row():
2255
+ ckpt_path1 = gr.Textbox(
2256
+ label=i18n("模型路径"), value="", interactive=True
2257
+ )
2258
+ but8 = gr.Button(i18n("查看"), variant="primary")
2259
+ info6 = gr.Textbox(label=i18n("输出信息"), value="", max_lines=8)
2260
+ but8.click(show_info, [ckpt_path1], info6)
2261
+ with gr.Group():
2262
+ gr.Markdown(
2263
+ value=i18n(
2264
+ "模型提取(输入logs文件夹下大文件模型路径),适用于训一半不想训了模型没有自动提取保存小文件模型,或者想测试中间模型的情况"
2265
+ )
2266
+ )
2267
+ with gr.Row():
2268
+ ckpt_path2 = gr.Textbox(
2269
+ label=i18n("模型路径"),
2270
+ value="E:\\codes\\py39\\logs\\mi-test_f0_48k\\G_23333.pth",
2271
+ interactive=True,
2272
+ )
2273
+ save_name = gr.Textbox(
2274
+ label=i18n("保存名"), value="", interactive=True
2275
+ )
2276
+ sr__ = gr.Radio(
2277
+ label=i18n("目标采样率"),
2278
+ choices=["32k", "40k", "48k"],
2279
+ value="40k",
2280
+ interactive=True,
2281
+ )
2282
+ if_f0__ = gr.Radio(
2283
+ label=i18n("模型是否带音高指导,1是0否"),
2284
+ choices=["1", "0"],
2285
+ value="1",
2286
+ interactive=True,
2287
+ )
2288
+ version_1 = gr.Radio(
2289
+ label=i18n("模型版本型号"),
2290
+ choices=["v1", "v2"],
2291
+ value="v2",
2292
+ interactive=True,
2293
+ )
2294
+ info___ = gr.Textbox(
2295
+ label=i18n("要置入的模型信息"), value="", max_lines=8, interactive=True
2296
+ )
2297
+ but9 = gr.Button(i18n("提取"), variant="primary")
2298
+ info7 = gr.Textbox(label=i18n("输出信息"), value="", max_lines=8)
2299
+ ckpt_path2.change(
2300
+ change_info_, [ckpt_path2], [sr__, if_f0__, version_1]
2301
+ )
2302
+ but9.click(
2303
+ extract_small_model,
2304
+ [ckpt_path2, save_name, sr__, if_f0__, info___, version_1],
2305
+ info7,
2306
+ )
2307
+
2308
+ with gr.TabItem(i18n("Onnx导出")):
2309
+ with gr.Row():
2310
+ ckpt_dir = gr.Textbox(label=i18n("RVC模型路径"), value="", interactive=True)
2311
+ with gr.Row():
2312
+ onnx_dir = gr.Textbox(
2313
+ label=i18n("Onnx输出路径"), value="", interactive=True
2314
+ )
2315
+ with gr.Row():
2316
+ infoOnnx = gr.Label(label="info")
2317
+ with gr.Row():
2318
+ butOnnx = gr.Button(i18n("导出Onnx模型"), variant="primary")
2319
+ butOnnx.click(export_onnx, [ckpt_dir, onnx_dir], infoOnnx)
2320
+
2321
+ tab_faq = i18n("常见问题解答")
2322
+ with gr.TabItem(tab_faq):
2323
+ try:
2324
+ if tab_faq == "常见问题解答":
2325
+ with open("docs/faq.md", "r", encoding="utf8") as f:
2326
+ info = f.read()
2327
+ else:
2328
+ with open("docs/faq_en.md", "r", encoding="utf8") as f:
2329
+ info = f.read()
2330
+ gr.Markdown(value=info)
2331
+ except:
2332
+ gr.Markdown(traceback.format_exc())
2333
+
2334
+
2335
+ #region Mangio Preset Handler Region
2336
+ def save_preset(
2337
+ preset_name,
2338
+ sid0,
2339
+ vc_transform,
2340
+ input_audio,
2341
+ f0method,
2342
+ crepe_hop_length,
2343
+ filter_radius,
2344
+ file_index1,
2345
+ file_index2,
2346
+ index_rate,
2347
+ resample_sr,
2348
+ rms_mix_rate,
2349
+ protect,
2350
+ f0_file
2351
+ ):
2352
+ data = None
2353
+ with open('../inference-presets.json', 'r') as file:
2354
+ data = json.load(file)
2355
+ preset_json = {
2356
+ 'name': preset_name,
2357
+ 'model': sid0,
2358
+ 'transpose': vc_transform,
2359
+ 'audio_file': input_audio,
2360
+ 'f0_method': f0method,
2361
+ 'crepe_hop_length': crepe_hop_length,
2362
+ 'median_filtering': filter_radius,
2363
+ 'feature_path': file_index1,
2364
+ 'auto_feature_path': file_index2,
2365
+ 'search_feature_ratio': index_rate,
2366
+ 'resample': resample_sr,
2367
+ 'volume_envelope': rms_mix_rate,
2368
+ 'protect_voiceless': protect,
2369
+ 'f0_file_path': f0_file
2370
+ }
2371
+ data['presets'].append(preset_json)
2372
+ with open('../inference-presets.json', 'w') as file:
2373
+ json.dump(data, file)
2374
+ file.flush()
2375
+ print("Saved Preset %s into inference-presets.json!" % preset_name)
2376
+
2377
+
2378
+ def on_preset_changed(preset_name):
2379
+ print("Changed Preset to %s!" % preset_name)
2380
+ data = None
2381
+ with open('../inference-presets.json', 'r') as file:
2382
+ data = json.load(file)
2383
+
2384
+ print("Searching for " + preset_name)
2385
+ returning_preset = None
2386
+ for preset in data['presets']:
2387
+ if(preset['name'] == preset_name):
2388
+ print("Found a preset")
2389
+ returning_preset = preset
2390
+ # return all new input values
2391
+ return (
2392
+ # returning_preset['model'],
2393
+ # returning_preset['transpose'],
2394
+ # returning_preset['audio_file'],
2395
+ # returning_preset['f0_method'],
2396
+ # returning_preset['crepe_hop_length'],
2397
+ # returning_preset['median_filtering'],
2398
+ # returning_preset['feature_path'],
2399
+ # returning_preset['auto_feature_path'],
2400
+ # returning_preset['search_feature_ratio'],
2401
+ # returning_preset['resample'],
2402
+ # returning_preset['volume_envelope'],
2403
+ # returning_preset['protect_voiceless'],
2404
+ # returning_preset['f0_file_path']
2405
+ )
2406
+
2407
+ # Preset State Changes
2408
+
2409
+ # This click calls save_preset that saves the preset into inference-presets.json with the preset name
2410
+ # mangio_preset_save_btn.click(
2411
+ # fn=save_preset,
2412
+ # inputs=[
2413
+ # mangio_preset_name_save,
2414
+ # sid0,
2415
+ # vc_transform0,
2416
+ # input_audio0,
2417
+ # f0method0,
2418
+ # crepe_hop_length,
2419
+ # filter_radius0,
2420
+ # file_index1,
2421
+ # file_index2,
2422
+ # index_rate1,
2423
+ # resample_sr0,
2424
+ # rms_mix_rate0,
2425
+ # protect0,
2426
+ # f0_file
2427
+ # ],
2428
+ # outputs=[]
2429
+ # )
2430
+
2431
+ # mangio_preset.change(
2432
+ # on_preset_changed,
2433
+ # inputs=[
2434
+ # # Pass inputs here
2435
+ # mangio_preset
2436
+ # ],
2437
+ # outputs=[
2438
+ # # Pass Outputs here. These refer to the gradio elements that we want to directly change
2439
+ # # sid0,
2440
+ # # vc_transform0,
2441
+ # # input_audio0,
2442
+ # # f0method0,
2443
+ # # crepe_hop_length,
2444
+ # # filter_radius0,
2445
+ # # file_index1,
2446
+ # # file_index2,
2447
+ # # index_rate1,
2448
+ # # resample_sr0,
2449
+ # # rms_mix_rate0,
2450
+ # # protect0,
2451
+ # # f0_file
2452
+ # ]
2453
+ # )
2454
+ #endregion
2455
+
2456
+ # with gr.TabItem(i18n("招募音高曲线前端编辑器")):
2457
+ # gr.Markdown(value=i18n("加开发群联系我xxxxx"))
2458
+ # with gr.TabItem(i18n("点击查看交流、问题反馈群号")):
2459
+ # gr.Markdown(value=i18n("xxxxx"))
2460
+
2461
+ if config.iscolab or config.paperspace: # Share gradio link for colab and paperspace (FORK FEATURE)
2462
+ app.queue(concurrency_count=511, max_size=1022).launch(share=True)
2463
+ else:
2464
+ app.queue(concurrency_count=511, max_size=1022).launch(
2465
+ server_name="0.0.0.0",
2466
+ inbrowser=not config.noautoopen,
2467
+ server_port=config.listen_port,
2468
+ quiet=True,
2469
+ )
2470
+
2471
+ #endregion