jhansss commited on
Commit
79ad7df
Β·
1 Parent(s): 996ea2d

Add README.md

Browse files
Files changed (1) hide show
  1. README.md +144 -0
README.md ADDED
@@ -0,0 +1,144 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SingingSDS: Role-Playing Singing Spoken Dialogue System
2
+
3
+ A role-playing singing dialogue system that converts speech input into character-based singing output.
4
+
5
+ ## Installation
6
+
7
+ ### Requirements
8
+
9
+ - Python 3.11+
10
+ - CUDA (optional, for GPU acceleration)
11
+
12
+ ### Install Dependencies
13
+
14
+ #### Option 1: Using Conda (Recommended)
15
+
16
+ ```bash
17
+ conda create -n singingsds python=3.11
18
+
19
+ conda activate singingsds
20
+ conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
21
+ pip install -r requirements.txt
22
+ ```
23
+
24
+ #### Option 2: Using pip only
25
+
26
+ ```bash
27
+ pip install -r requirements.txt
28
+ ```
29
+
30
+ #### Option 3: Using pip with virtual environment
31
+
32
+ ```bash
33
+ python -m venv singingsds_env
34
+
35
+ # On Windows:
36
+ singingsds_env\Scripts\activate
37
+ # On macOS/Linux:
38
+ source singingsds_env/bin/activate
39
+
40
+ pip install -r requirements.txt
41
+ ```
42
+
43
+ ## Usage
44
+
45
+ ### Command Line Interface (CLI)
46
+
47
+ #### Example Usage
48
+
49
+ ```bash
50
+ python cli.py --query_audio data/query/hello.wav --config_path config/cli/yaoyin_default.yaml --output_audio outputs/yaoyin_hello.wav
51
+ ```
52
+
53
+ #### Parameter Description
54
+
55
+ - `--query_audio`: Input audio file path (required)
56
+ - `--config_path`: Configuration file path (default: config/cli/yaoyin_default.yaml)
57
+ - `--output_audio`: Output audio file path (required)
58
+
59
+
60
+ ### Web Interface (Gradio)
61
+
62
+ Start the web interface:
63
+
64
+ ```bash
65
+ python app.py
66
+ ```
67
+
68
+ Then visit the displayed address in your browser to use the graphical interface.
69
+
70
+ ## Configuration
71
+
72
+ ### Character Configuration
73
+
74
+ The system supports multiple preset characters:
75
+
76
+ - **Yaoyin (ι₯音)**: Default timbre is `timbre2`
77
+ - **Limei (δΈ½ζ’…)**: Default timbre is `timbre1`
78
+
79
+ ### Model Configuration
80
+
81
+ #### ASR Models
82
+ - `openai/whisper-large-v3-turbo`
83
+ - `openai/whisper-large-v3`
84
+ - `openai/whisper-medium`
85
+ - `sanchit-gandhi/whisper-small-dv`
86
+ - `facebook/wav2vec2-base-960h`
87
+
88
+ #### LLM Models
89
+ - `google/gemma-2-2b`
90
+ - `MiniMaxAI/MiniMax-M1-80k`
91
+ - `meta-llama/Llama-3.2-3B-Instruct`
92
+
93
+ #### SVS Models
94
+ - `espnet/mixdata_svs_visinger2_spkemb_lang_pretrained` (Bilingual)
95
+ - `espnet/aceopencpop_svs_visinger2_40singer_pretrain` (Chinese)
96
+
97
+ ## Project Structure
98
+
99
+ ```
100
+ SingingSDS/
101
+ β”œβ”€β”€ cli.py # Command line interface
102
+ β”œβ”€β”€ interface.py # Gradio interface
103
+ β”œβ”€β”€ pipeline.py # Core processing pipeline
104
+ β”œβ”€β”€ app.py # Web application entry
105
+ β”œβ”€β”€ requirements.txt # Python dependencies
106
+ β”œβ”€β”€ config/ # Configuration files
107
+ β”‚ β”œβ”€β”€ cli/ # CLI-specific configuration
108
+ β”‚ └── interface/ # Interface-specific configuration
109
+ β”œβ”€β”€ modules/ # Core modules
110
+ β”‚ β”œβ”€β”€ asr.py # Speech recognition module
111
+ β”‚ β”œβ”€β”€ llm.py # Large language model module
112
+ β”‚ β”œβ”€β”€ melody.py # Melody control module
113
+ β”‚ β”œβ”€β”€ svs/ # Singing voice synthesis modules
114
+ β”‚ β”‚ β”œβ”€β”€ base.py # Base SVS class
115
+ β”‚ β”‚ β”œβ”€β”€ espnet.py # ESPnet SVS implementation
116
+ β”‚ β”‚ β”œβ”€β”€ registry.py # SVS model registry
117
+ β”‚ β”‚ └── __init__.py # SVS module initialization
118
+ β”‚ └── utils/ # Utility modules
119
+ β”‚ β”œβ”€β”€ g2p.py # Grapheme-to-phoneme conversion
120
+ β”‚ β”œβ”€β”€ text_normalize.py # Text normalization
121
+ β”‚ └── resources/ # Utility resources
122
+ β”œβ”€β”€ characters/ # Character definitions
123
+ β”‚ β”œβ”€β”€ base.py # Base character class
124
+ β”‚ β”œβ”€β”€ Limei.py # Limei character definition
125
+ β”‚ β”œβ”€β”€ Yaoyin.py # Yaoyin character definition
126
+ β”‚ └── __init__.py # Character module initialization
127
+ β”œβ”€β”€ evaluation/ # Evaluation modules
128
+ β”‚ └── svs_eval.py # SVS evaluation metrics
129
+ β”œβ”€β”€ data/ # Data directory
130
+ β”‚ β”œβ”€β”€ kising/ # Kising dataset
131
+ β”‚ └── touhou/ # Touhou dataset
132
+ β”œβ”€β”€ resources/ # Project resources
133
+ β”œβ”€β”€ data_handlers/ # Data handling utilities
134
+ β”œβ”€β”€ assets/ # Static assets
135
+ └── tests/ # Test files
136
+ ```
137
+
138
+ ## Contributing
139
+
140
+ Issues and Pull Requests are welcome!
141
+
142
+ ## License
143
+
144
+