ChuxiJ commited on
Commit
e40a223
·
verified ·
1 Parent(s): 1d2a7d9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +159 -157
README.md CHANGED
@@ -1,158 +1,160 @@
1
- ---
2
- license: apache-2.0
3
- tags:
4
- - music
5
- - text2music
6
- pipeline_tag: text-to-audio
7
- language:
8
- - en
9
- - zh
10
- - de
11
- - fr
12
- - es
13
- - it
14
- - pt
15
- - pl
16
- - tr
17
- - ru
18
- - cs
19
- - nl
20
- - ar
21
- - ja
22
- - hu
23
- - ko
24
- - hi
25
- library_name: diffusers
26
- ---
27
-
28
- # 🎤 Chinese Rap LoRA for ACE-Step (Rap Machine)
29
-
30
- This is a hybrid rap voice model. We meticulously curated Chinese rap/hip-hop datasets for training, with rigorous data cleaning and recaptioning. The results demonstrate:
31
-
32
- - Improved Chinese pronunciation accuracy
33
- - Enhanced stylistic adherence to hip-hop and electronic genres
34
- - Greater diversity in hip-hop vocal expressions
35
-
36
- ## Usage Guide
37
-
38
- 1. Generate higher-quality Chinese songs
39
- 2. Create superior hip-hop tracks
40
- 3. Blend with other genres to:
41
- - Produce music with better vocal quality and detail
42
- - Add experimental flavors (e.g., underground, street culture)
43
- 4. Fine-tune using these parameters:
44
-
45
- **Vocal Controls**
46
- **`vocal_timbre`**
47
- - Examples: Bright, dark, warm, cold, breathy, nasal, gritty, smooth, husky, metallic, whispery, resonant, airy, smoky, sultry, light, clear, high-pitched, raspy, powerful, ethereal, flute-like, hollow, velvety, shrill, hoarse, mellow, thin, thick, reedy, silvery, twangy.
48
- - Describes inherent vocal qualities.
49
-
50
- **`techniques`** (List)
51
- - Rap styles: `mumble rap`, `chopper rap`, `melodic rap`, `lyrical rap`, `trap flow`, `double-time rap`
52
- - Vocal FX: `auto-tune`, `reverb`, `delay`, `distortion`
53
- - Delivery: `whispered`, `shouted`, `spoken word`, `narration`, `singing`
54
- - Other: `ad-libs`, `call-and-response`, `harmonized`
55
-
56
- ## Community Note
57
-
58
- While a Chinese rap LoRA might seem niche for non-Chinese communities, we consistently demonstrate through such projects that ACE-step - as a music generation foundation model - holds boundless potential. It doesn't just improve pronunciation in one language, but spawns new styles.
59
-
60
- The universal human appreciation of music is a precious asset. Like abstract LEGO blocks, these elements will eventually combine in more organic ways. May our open-source contributions propel the evolution of musical history forward.
61
-
62
- ---
63
-
64
- # ACE-Step: A Step Towards Music Generation Foundation Model
65
-
66
- ![ACE-Step Framework](https://github.com/ACE-Step/ACE-Step/raw/main/assets/ACE-Step_framework.png)
67
-
68
- ## Model Description
69
-
70
- ACE-Step is a novel open-source foundation model for music generation that overcomes key limitations of existing approaches through a holistic architectural design. It integrates diffusion-based generation with Sana's Deep Compression AutoEncoder (DCAE) and a lightweight linear transformer, achieving state-of-the-art performance in generation speed, musical coherence, and controllability.
71
-
72
- **Key Features:**
73
- - 15× faster than LLM-based baselines (20s for 4-minute music on A100)
74
- - Superior musical coherence across melody, harmony, and rhythm
75
- - full-song generation, duration control and accepts natural language descriptions
76
-
77
- ## Uses
78
-
79
- ### Direct Use
80
- ACE-Step can be used for:
81
- - Generating original music from text descriptions
82
- - Music remixing and style transfer
83
- - edit song lyrics
84
-
85
- ### Downstream Use
86
- The model serves as a foundation for:
87
- - Voice cloning applications
88
- - Specialized music generation (rap, jazz, etc.)
89
- - Music production tools
90
- - Creative AI assistants
91
-
92
- ### Out-of-Scope Use
93
- The model should not be used for:
94
- - Generating copyrighted content without permission
95
- - Creating harmful or offensive content
96
- - Misrepresenting AI-generated music as human-created
97
-
98
- ## How to Get Started
99
-
100
- see: https://github.com/ace-step/ACE-Step
101
-
102
- ## Hardware Performance
103
-
104
- | Device | 27 Steps | 60 Steps |
105
- |---------------|----------|----------|
106
- | NVIDIA A100 | 27.27x | 12.27x |
107
- | RTX 4090 | 34.48x | 15.63x |
108
- | RTX 3090 | 12.76x | 6.48x |
109
- | M2 Max | 2.27x | 1.03x |
110
-
111
- *RTF (Real-Time Factor) shown - higher values indicate faster generation*
112
-
113
-
114
- ## Limitations
115
-
116
- - Performance varies by language (top 10 languages perform best)
117
- - Longer generations (>5 minutes) may lose structural coherence
118
- - Rare instruments may not render perfectly
119
- - Output Inconsistency: Highly sensitive to random seeds and input duration, leading to varied "gacha-style" results.
120
- - Style-specific Weaknesses: Underperforms on certain genres (e.g. Chinese rap/zh_rap) Limited style adherence and musicality ceiling
121
- - Continuity Artifacts: Unnatural transitions in repainting/extend operations
122
- - Vocal Quality: Coarse vocal synthesis lacking nuance
123
- - Control Granularity: Needs finer-grained musical parameter control
124
-
125
- ## Ethical Considerations
126
-
127
- Users should:
128
- - Verify originality of generated works
129
- - Disclose AI involvement
130
- - Respect cultural elements and copyrights
131
- - Avoid harmful content generation
132
-
133
-
134
- ## Model Details
135
-
136
- **Developed by:** ACE Studio and StepFun
137
- **Model type:** Diffusion-based music generation with transformer conditioning
138
- **License:** Apache 2.0
139
- **Resources:**
140
- - [Project Page](https://ace-step.github.io/)
141
- - [Demo Space](https://huggingface.co/spaces/ACE-Step/ACE-Step)
142
- - [GitHub Repository](https://github.com/ACE-Step/ACE-Step)
143
-
144
-
145
- ## Citation
146
-
147
- ```bibtex
148
- @misc{gong2025acestep,
149
- title={ACE-Step: A Step Towards Music Generation Foundation Model},
150
- author={Junmin Gong, Wenxiao Zhao, Sen Wang, Shengyuan Xu, Jing Guo},
151
- howpublished={\url{https://github.com/ace-step/ACE-Step}},
152
- year={2025},
153
- note={GitHub repository}
154
- }
155
- ```
156
-
157
- ## Acknowledgements
 
 
158
  This project is co-led by ACE Studio and StepFun.
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - music
5
+ - text2music
6
+ pipeline_tag: text-to-audio
7
+ language:
8
+ - en
9
+ - zh
10
+ - de
11
+ - fr
12
+ - es
13
+ - it
14
+ - pt
15
+ - pl
16
+ - tr
17
+ - ru
18
+ - cs
19
+ - nl
20
+ - ar
21
+ - ja
22
+ - hu
23
+ - ko
24
+ - hi
25
+ library_name: diffusers
26
+ ---
27
+
28
+ # 🎤 Chinese Rap LoRA for ACE-Step (Rap Machine)
29
+
30
+ This is a hybrid rap voice model. We meticulously curated Chinese rap/hip-hop datasets for training, with rigorous data cleaning and recaptioning. The results demonstrate:
31
+
32
+ - Improved Chinese pronunciation accuracy
33
+ - Enhanced stylistic adherence to hip-hop and electronic genres
34
+ - Greater diversity in hip-hop vocal expressions
35
+
36
+ Audio Examples see: https://ace-step.github.io/#RapMachine
37
+
38
+ ## Usage Guide
39
+
40
+ 1. Generate higher-quality Chinese songs
41
+ 2. Create superior hip-hop tracks
42
+ 3. Blend with other genres to:
43
+ - Produce music with better vocal quality and detail
44
+ - Add experimental flavors (e.g., underground, street culture)
45
+ 4. Fine-tune using these parameters:
46
+
47
+ **Vocal Controls**
48
+ **`vocal_timbre`**
49
+ - Examples: Bright, dark, warm, cold, breathy, nasal, gritty, smooth, husky, metallic, whispery, resonant, airy, smoky, sultry, light, clear, high-pitched, raspy, powerful, ethereal, flute-like, hollow, velvety, shrill, hoarse, mellow, thin, thick, reedy, silvery, twangy.
50
+ - Describes inherent vocal qualities.
51
+
52
+ **`techniques`** (List)
53
+ - Rap styles: `mumble rap`, `chopper rap`, `melodic rap`, `lyrical rap`, `trap flow`, `double-time rap`
54
+ - Vocal FX: `auto-tune`, `reverb`, `delay`, `distortion`
55
+ - Delivery: `whispered`, `shouted`, `spoken word`, `narration`, `singing`
56
+ - Other: `ad-libs`, `call-and-response`, `harmonized`
57
+
58
+ ## Community Note
59
+
60
+ While a Chinese rap LoRA might seem niche for non-Chinese communities, we consistently demonstrate through such projects that ACE-step - as a music generation foundation model - holds boundless potential. It doesn't just improve pronunciation in one language, but spawns new styles.
61
+
62
+ The universal human appreciation of music is a precious asset. Like abstract LEGO blocks, these elements will eventually combine in more organic ways. May our open-source contributions propel the evolution of musical history forward.
63
+
64
+ ---
65
+
66
+ # ACE-Step: A Step Towards Music Generation Foundation Model
67
+
68
+ ![ACE-Step Framework](https://github.com/ACE-Step/ACE-Step/raw/main/assets/ACE-Step_framework.png)
69
+
70
+ ## Model Description
71
+
72
+ ACE-Step is a novel open-source foundation model for music generation that overcomes key limitations of existing approaches through a holistic architectural design. It integrates diffusion-based generation with Sana's Deep Compression AutoEncoder (DCAE) and a lightweight linear transformer, achieving state-of-the-art performance in generation speed, musical coherence, and controllability.
73
+
74
+ **Key Features:**
75
+ - 15× faster than LLM-based baselines (20s for 4-minute music on A100)
76
+ - Superior musical coherence across melody, harmony, and rhythm
77
+ - full-song generation, duration control and accepts natural language descriptions
78
+
79
+ ## Uses
80
+
81
+ ### Direct Use
82
+ ACE-Step can be used for:
83
+ - Generating original music from text descriptions
84
+ - Music remixing and style transfer
85
+ - edit song lyrics
86
+
87
+ ### Downstream Use
88
+ The model serves as a foundation for:
89
+ - Voice cloning applications
90
+ - Specialized music generation (rap, jazz, etc.)
91
+ - Music production tools
92
+ - Creative AI assistants
93
+
94
+ ### Out-of-Scope Use
95
+ The model should not be used for:
96
+ - Generating copyrighted content without permission
97
+ - Creating harmful or offensive content
98
+ - Misrepresenting AI-generated music as human-created
99
+
100
+ ## How to Get Started
101
+
102
+ see: https://github.com/ace-step/ACE-Step
103
+
104
+ ## Hardware Performance
105
+
106
+ | Device | 27 Steps | 60 Steps |
107
+ |---------------|----------|----------|
108
+ | NVIDIA A100 | 27.27x | 12.27x |
109
+ | RTX 4090 | 34.48x | 15.63x |
110
+ | RTX 3090 | 12.76x | 6.48x |
111
+ | M2 Max | 2.27x | 1.03x |
112
+
113
+ *RTF (Real-Time Factor) shown - higher values indicate faster generation*
114
+
115
+
116
+ ## Limitations
117
+
118
+ - Performance varies by language (top 10 languages perform best)
119
+ - Longer generations (>5 minutes) may lose structural coherence
120
+ - Rare instruments may not render perfectly
121
+ - Output Inconsistency: Highly sensitive to random seeds and input duration, leading to varied "gacha-style" results.
122
+ - Style-specific Weaknesses: Underperforms on certain genres (e.g. Chinese rap/zh_rap) Limited style adherence and musicality ceiling
123
+ - Continuity Artifacts: Unnatural transitions in repainting/extend operations
124
+ - Vocal Quality: Coarse vocal synthesis lacking nuance
125
+ - Control Granularity: Needs finer-grained musical parameter control
126
+
127
+ ## Ethical Considerations
128
+
129
+ Users should:
130
+ - Verify originality of generated works
131
+ - Disclose AI involvement
132
+ - Respect cultural elements and copyrights
133
+ - Avoid harmful content generation
134
+
135
+
136
+ ## Model Details
137
+
138
+ **Developed by:** ACE Studio and StepFun
139
+ **Model type:** Diffusion-based music generation with transformer conditioning
140
+ **License:** Apache 2.0
141
+ **Resources:**
142
+ - [Project Page](https://ace-step.github.io/)
143
+ - [Demo Space](https://huggingface.co/spaces/ACE-Step/ACE-Step)
144
+ - [GitHub Repository](https://github.com/ACE-Step/ACE-Step)
145
+
146
+
147
+ ## Citation
148
+
149
+ ```bibtex
150
+ @misc{gong2025acestep,
151
+ title={ACE-Step: A Step Towards Music Generation Foundation Model},
152
+ author={Junmin Gong, Wenxiao Zhao, Sen Wang, Shengyuan Xu, Jing Guo},
153
+ howpublished={\url{https://github.com/ace-step/ACE-Step}},
154
+ year={2025},
155
+ note={GitHub repository}
156
+ }
157
+ ```
158
+
159
+ ## Acknowledgements
160
  This project is co-led by ACE Studio and StepFun.