Spaces:

PhoenixStormJr
/

RVC_V2_Docker_Translated_might-be-broken

Sleeping

App Files Files Community

PhoenixStormJr commited on Jun 12

Commit

0a99765

verified ·

1 Parent(s): 34f40f7

Update Changelog_CN.md

Browse files

Files changed (1) hide show

Changelog_CN.md +67 -67

Changelog_CN.md CHANGED Viewed

@@ -1,80 +1,80 @@
-### 20230618更新
-- v2增加32k和48k两个新预训练模型
-- 修复非f0模型推理报错
-- 对于超过一小时的训练集的索引建立环节，自动kmeans缩小特征处理以加速索引训练、加入和查询
-- 附送一个人声转吉他玩具仓库
-- 数据处理剔除异常值切片
-- onnx导出选项卡
-失败的实验：
-- ~~特征检索增加时序维度：寄，没啥效果~~
-- ~~特征检索增加PCAR降维可选项：寄，数据大用kmeans缩小数据量，数据小降维操作耗时比省下的匹配耗时还多~~
-- ~~支持onnx推理（附带仅推理的小压缩包）：寄，生成nsf还是需要pytorch~~
-- ~~训练时在音高、gender、eq、噪声等方面对输入进行随机增强：寄，没啥效果~~
-todolist：
-- 接入小型声码器调研
-- 训练集音高识别支持crepe
-- crepe的精度支持和RVC-config同步
-- 对接F0编辑器
-### 20230528更新
-- 增加v2的jupyter notebook，韩文changelog，增加一些环境依赖
-- 增加呼吸、清辅音、齿音保护模式
-- 支持crepe-full推理
-- UVR5人声伴奏分离加上3个去延迟模型和MDX-Net去混响模型，增加HP3人声提取模型
-- 索引名称增加版本和实验名称
-- 人声伴奏分离、推理批量导出增加音频导出格式选项
-- 废弃32k模型的训练
-### 20230513更新
-- 清除一键包内部老版本runtime内残留的infer_pack和uvr5_pack
-- 修复训练集预处理伪多进程的bug
-- 增加harvest识别音高可选通过中值滤波削弱哑音现象，可调整中值滤波半径
-- 导出音频增加后处理重采样
-- 训练n_cpu进程数从"仅调整f0提取"改为"调整数据预处理和f0提取"
-- 自动检测logs文件夹下的index路径，提供下拉列表功能
-- tab页增加"常见问题解答"（也可参考github-rvc-wiki）
-- 相同路径的输入音频推理增加了音高缓存（用途：使用harvest音高提取，整个pipeline会经历漫长且重复的音高提取过程，如果不使用缓存，实验不同音色、索引、音高中值滤波半径参数的用户在第一次测试后的等待结果会非常痛苦）
-### 20230514更新
-- 音量包络对齐输入混合（可以缓解“输入静音输出小幅度噪声”的问题。如果输入音频背景底噪大则不建议开启，默认不开启（值为1可视为不开启））
-- 支持按照指定频率保存提取的小模型（假如你想尝试不同epoch下的推理效果，但是不想保存所有大checkpoint并且每次都要ckpt手工处理提取小模型，这项功能会非常实用）
-- 通过设置环境变量解决服务端开了系统全局代理导致浏览器连接错误的问题
-- 支持v2预训练模型（目前只公开了40k版本进行测试，另外2个采样率还没有训练完全）
-- 推理前限制超过1的过大音量
-- 微调数据预处理参数
-### 20230409更新
-- 修正训练参数，提升显卡平均利用率，A100最高从25%提升至90%左右，V100:50%->90%左右，2060S:60%->85%左右，P40:25%->95%左右，训练速度显著提升
-- 修正参数：总batch_size改为每张卡的batch_size
-- 修正total_epoch：最大限制100解锁至1000；默认10提升至默认20
-- 修复ckpt提取识别是否带音高错误导致推理异常的问题
-- 修复分布式训练每个rank都保存一次ckpt的问题
-- 特征提取进行nan特征过滤
-- 修复静音输入输出随机辅音or噪声的问题（老版模型需要重做训练集重训）
-### 20230416更新
-- 新增本地实时变声迷你GUI，双击go-realtime-gui.bat启动
-- 训练推理均对<50Hz的频段进行滤波过滤
-- 训练推理音高提取pyworld最低音高从默认80下降至50,50-80hz间的男声低音不会哑
-- WebUI支持根据系统区域变更语言（现支持en_US，ja_JP，zh_CN，zh_HK，zh_SG，zh_TW，不支持的默认en_US）
-- 修正部分显卡识别（例如V100-16G识别失败，P4识别失败）
-### 20230428更新
-- 升级faiss索引设置，速度更快，质量更高
-- 取消total_npy依赖，后续分享模型不再需要填写total_npy
-- 解锁16系限制。4G显存GPU给到4G的推理设置。
-- 修复部分音频格式下UVR5人声伴奏分离的bug
-- 实时变声迷你gui增加对非40k与不懈怠音高模型的支持
-### 后续计划：
-功能：
-- 支持多人训练选项卡（至多4人）
-底模：
-- 收集呼吸wav加入训练集修正呼吸变声电音的问题
-- 我们正在训练增加了歌声训练集的底模，未来会公开

+### 20230618 Update
+- v2 adds two new pre-trained models, 32k and 48k
+- Fix non-f0 model inference error
+- For the indexing phase of training sets longer than one hour, automatically kmeans reduces feature processing to speed up index training, joining and querying
+- Comes with a voice-to-guitar toy warehouse
+- Data processing removes outlier slices
+- onnx export tab
+Failed experiments:
+- ~~Feature retrieval adds time series dimension: sent, no effect~~
+- ~~Feature retrieval adds PCAR dimension reduction option: sent, use kmeans to reduce the data volume for large data, and the dimension reduction operation takes more time than the saved matching time for small data~~
+- ~~Support onnx reasoning (with a small compressed package for reasoning only): sent, pytorch is still needed to generate nsf~~
+- ~~Randomly enhance the input in terms of pitch, gender, eq, noise, etc. during training: sent, no effect~~
+todolist:
+- Connect to small vocoder research
+- Training set pitch recognition supports crepe
+- Crepe's accuracy supports synchronization with RVC-config
+- Connect to F0 editor
+### 20230528 Update
+- Added v2 jupyter notebook, Korean changelog, and some environment dependencies
+- Added breathing, clear consonant, and sibilant protection modes
+- Support crepe-full reasoning
+- UVR5 vocal accompaniment separation plus 3 de-delay models and MDX-Net dereverberation model, and added HP3 vocal extraction model
+- Added version and experiment name to the index name
+- Added audio export format options for vocal accompaniment separation and reasoning batch export
+- Abandoned the training of 32k model
+### 20230513 Update
+- Clear the remaining infer_pack and uvr5_pack in the old version of runtime in the one-click package
+- Fixed the bug of pseudo multi-process in training set preprocessing
+- Added harvest recognition pitch option to reduce mute phenomenon through median filtering, and adjustable median filter radius
+- Export audio with post-processing resampling
+- Changed the number of training n_cpu processes from "Adjust f0 extraction only" to "Adjust data preprocessing and f0 extraction"
+- Automatically detect the index path under the logs folder and provide a drop-down list function
+- Add "FAQ" to the tab page (also refer to github-rvc-wiki)
+- Added pitch cache for input audio inference on the same path (Purpose: When using harvest pitch extraction, the entire pipeline will go through a long and repeated pitch extraction process. If cache is not used, users who experiment with different timbres, indices, and pitch median filter radius parameters will have a very painful wait for the results after the first test)
+### 20230514 Update
+- Volume envelope alignment input mixing (can alleviate the problem of "input silence output small noise". If the input audio background noise is large, it is not recommended to turn it on. It is not turned on by default (the value is 1, which can be regarded as not turned on))
+- Support saving the extracted small model at a specified frequency (if you want to try the inference effect under different epochs, but don't want to save all the large checkpoints and manually process the extracted small model every time, this function will be very useful)
+- Solve the problem of browser connection errors caused by the server opening the system global proxy by setting environment variables
+- Support v2 pre-trained models (currently only the 40k version is open for testing, and the other 2 sampling rates have not been fully trained)
+- Limit the excessive volume of more than 1 before inference
+- Fine-tune data preprocessing parameters
+### 20230409 Update
+- Corrected training parameters, improved average graphics card utilization, A100 increased from 25% to about 90%, V100: 50%->90%, 2060S: 60%->85%, P40: 25%->95%, training speed significantly improved
+- Corrected parameters: total batch_size changed to batch_size of each card
+- Corrected total_epoch: maximum limit 100 unlocked to 1000; default 10 increased to default 20
+- Fixed the problem of ckpt extraction recognition with pitch error causing inference abnormality
+- Fixed the problem of ckpt being saved once for each rank in distributed training
+- Feature extraction performs nan feature filtering
+- Fixed the problem of silent input outputting random consonants or noise (old version model needs to redo training set retraining)
+### 20230416 Update
+- Added local real-time voice change mini GUI, double-click go-realtime-gui.bat to start
+- Training and inference all filter the frequency band <50Hz
+- Training and inference pitch extraction pyworld's lowest pitch is reduced from the default 80 to 50, and the male bass between 50-80hz will not be muffled
+- WebUI supports changing the language according to the system region (currently supports en_US, ja_JP, zh_CN, zh_HK, zh_SG, zh_TW, and the default en_US is not supported)
+- Corrected some graphics card recognition (for example, V100-16G recognition failed, P4 recognition failed)
+### 20230428 Update
+- Upgrade faiss index settings, faster and higher quality
+- Cancel total_npy dependency, no need to fill in total_npy for subsequent model sharing
+- Unlock 16 series restrictions. 4G memory GPU gives 4G inference settings.
+- Fix the bug of UVR5 vocal accompaniment separation under some audio formats
+- Real-time voice change mini gui adds support for non-40k and non-slack pitch models
+### Follow-up plan:
+Function:
+- Support multi-player training tab (up to 4 people)
+Base model:
+- Collect breathing wav and add it to the training set to fix the problem of breathing voice changing electronic music
+- We are training the base model of the singing training set, which will be made public in the future