PhoenixStormJr commited on
Commit
aa82845
·
verified ·
1 Parent(s): 5e9e41d

Upload Changelog_EN.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. Changelog_EN.md +83 -0
Changelog_EN.md ADDED
@@ -0,0 +1,83 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ### 2023-06-18
2
+ - New pretrained v2 models: 32k and 48k
3
+ - Fix non-f0 model inference errors
4
+ - For training-set exceeding 1 hour, do automatic minibatch-kmeans to reduce feature shape, so that index training, adding, and searching will be much faster.
5
+ - Provide a toy vocal2guitar huggingface space
6
+ - Auto delete outlier short cut training-set audios
7
+ - Onnx export tab
8
+
9
+ Failed experiments:
10
+ - ~~Feature retrieval: add temporal feature retrieval: not effective~~
11
+ - ~~Feature retrieval: add PCAR dimensionality reduction: searching is even slower~~
12
+ - ~~Random data augmentation when training: not effective~~
13
+
14
+ todolist:
15
+ - Vocos-RVC (tiny vocoder)
16
+ - Crepe support for training
17
+ - Half precision crepe inference
18
+ - F0 editor support
19
+
20
+ ### 2023-05-28
21
+ - Add v2 jupyter notebook, korean changelog, fix some environment requirments
22
+ - Add voiceless consonant and breath protection mode
23
+ - Support crepe-full pitch detect
24
+ - UVR5 vocal separation: support dereverb models and de-echo models
25
+ - Add experiment name and version on the name of index
26
+ - Support users to manually select export format of output audios when batch voice conversion processing and UVR5 vocal separation
27
+ - v1 32k model training is no more supported
28
+
29
+ ### 2023-05-13
30
+ - Clear the redundant codes in the old version of runtime in the one-click-package: infer_pack and uvr5_pack
31
+ - Fix pseudo multiprocessing bug in training set preprocessing
32
+ - Adding median filtering radius adjustment for harvest pitch recognize algorithm
33
+ - Support post processing resampling for exporting audio
34
+ - Multi processing "n_cpu" setting for training is changed from "f0 extraction" to "data preprocessing and f0 extraction"
35
+ - Automatically detect the index paths under the logs folder and provide a drop-down list function
36
+ - Add "Frequently Asked Questions and Answers" on the tab page (you can also refer to github RVC wiki)
37
+ - When inference, harvest pitch is cached when using same input audio path (purpose: using harvest pitch extraction, the entire pipeline will go through a long and repetitive pitch extraction process. If caching is not used, users who experiment with different timbre, index, and pitch median filtering radius settings will experience a very painful waiting process after the first inference)
38
+
39
+ ### 2023-05-14
40
+ - Use volume envelope of input to mix or replace the volume envelope of output (can alleviate the problem of "input muting and output small amplitude noise". If the input audio background noise is high, it is not recommended to turn it on, and it is not turned on by default (1 can be considered as not turned on)
41
+ - Support saving extracted small models at a specified frequency (if you want to see the performance under different epochs, but do not want to save all large checkpoints and manually extract small models by ckpt-processing every time, this feature will be very practical)
42
+ - Resolve the issue of "connection errors" caused by the server's global proxy by setting environment variables
43
+ - Supports pre-trained v2 models (currently only 40k versions are publicly available for testing, and the other two sampling rates have not been fully trained yet)
44
+ - Limit excessive volume exceeding 1 before inference
45
+ - Slightly adjusted the settings of training-set preprocessing
46
+
47
+
48
+ #######################
49
+
50
+ History changelogs:
51
+
52
+ ### 2023-04-09
53
+ - Fixed training parameters to improve GPU utilization rate: A100 increased from 25% to around 90%, V100: 50% to around 90%, 2060S: 60% to around 85%, P40: 25% to around 95%; significantly improved training speed
54
+ - Changed parameter: total batch_size is now per GPU batch_size
55
+ - Changed total_epoch: maximum limit increased from 100 to 1000; default increased from 10 to 20
56
+ - Fixed issue of ckpt extraction recognizing pitch incorrectly, causing abnormal inference
57
+ - Fixed issue of distributed training saving ckpt for each rank
58
+ - Applied nan feature filtering for feature extraction
59
+ - Fixed issue with silent input/output producing random consonants or noise (old models need to retrain with a new dataset)
60
+
61
+ ### 2023-04-16 Update
62
+ - Added local real-time voice changing mini-GUI, start by double-clicking go-realtime-gui.bat
63
+ - Applied filtering for frequency bands below 50Hz during training and inference
64
+ - Lowered the minimum pitch extraction of pyworld from the default 80 to 50 for training and inference, allowing male low-pitched voices between 50-80Hz not to be muted
65
+ - WebUI supports changing languages according to system locale (currently supporting en_US, ja_JP, zh_CN, zh_HK, zh_SG, zh_TW; defaults to en_US if not supported)
66
+ - Fixed recognition of some GPUs (e.g., V100-16G recognition failure, P4 recognition failure)
67
+
68
+ ### 2023-04-28 Update
69
+ - Upgraded faiss index settings for faster speed and higher quality
70
+ - Removed dependency on total_npy; future model sharing will not require total_npy input
71
+ - Unlocked restrictions for the 16-series GPUs, providing 4GB inference settings for 4GB VRAM GPUs
72
+ - Fixed bug in UVR5 vocal accompaniment separation for certain audio formats
73
+ - Real-time voice changing mini-GUI now supports non-40k and non-lazy pitch models
74
+
75
+ ### Future Plans:
76
+ Features:
77
+ - Add option: extract small models for each epoch save
78
+ - Add option: export additional mp3 to the specified path during inference
79
+ - Support multi-person training tab (up to 4 people)
80
+
81
+ Base model:
82
+ - Collect breathing wav files to add to the training dataset to fix the issue of distorted breath sounds
83
+ - We are currently training a base model with an extended singing dataset, which will be released in the future