File size: 5,472 Bytes
e40a223
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1d2a7d9
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
---
license: apache-2.0
tags:
- music
- text2music
pipeline_tag: text-to-audio
language:
- en
- zh
- de
- fr
- es
- it
- pt
- pl
- tr
- ru
- cs
- nl
- ar
- ja
- hu
- ko
- hi
library_name: diffusers
---

# 🎤 Chinese Rap LoRA for ACE-Step (Rap Machine)

This is a hybrid rap voice model. We meticulously curated Chinese rap/hip-hop datasets for training, with rigorous data cleaning and recaptioning. The results demonstrate:

- Improved Chinese pronunciation accuracy
- Enhanced stylistic adherence to hip-hop and electronic genres
- Greater diversity in hip-hop vocal expressions

Audio Examples see: https://ace-step.github.io/#RapMachine

## Usage Guide

1. Generate higher-quality Chinese songs
2. Create superior hip-hop tracks
3. Blend with other genres to:
   - Produce music with better vocal quality and detail
   - Add experimental flavors (e.g., underground, street culture)
4. Fine-tune using these parameters:

**Vocal Controls**  
**`vocal_timbre`**  
- Examples: Bright, dark, warm, cold, breathy, nasal, gritty, smooth, husky, metallic, whispery, resonant, airy, smoky, sultry, light, clear, high-pitched, raspy, powerful, ethereal, flute-like, hollow, velvety, shrill, hoarse, mellow, thin, thick, reedy, silvery, twangy.  
- Describes inherent vocal qualities.

**`techniques`** (List)  
- Rap styles: `mumble rap`, `chopper rap`, `melodic rap`, `lyrical rap`, `trap flow`, `double-time rap`  
- Vocal FX: `auto-tune`, `reverb`, `delay`, `distortion`  
- Delivery: `whispered`, `shouted`, `spoken word`, `narration`, `singing`  
- Other: `ad-libs`, `call-and-response`, `harmonized`

## Community Note

While a Chinese rap LoRA might seem niche for non-Chinese communities, we consistently demonstrate through such projects that ACE-step - as a music generation foundation model - holds boundless potential. It doesn't just improve pronunciation in one language, but spawns new styles. 

The universal human appreciation of music is a precious asset. Like abstract LEGO blocks, these elements will eventually combine in more organic ways. May our open-source contributions propel the evolution of musical history forward.

---

# ACE-Step: A Step Towards Music Generation Foundation Model

![ACE-Step Framework](https://github.com/ACE-Step/ACE-Step/raw/main/assets/ACE-Step_framework.png)

## Model Description

ACE-Step is a novel open-source foundation model for music generation that overcomes key limitations of existing approaches through a holistic architectural design. It integrates diffusion-based generation with Sana's Deep Compression AutoEncoder (DCAE) and a lightweight linear transformer, achieving state-of-the-art performance in generation speed, musical coherence, and controllability.

**Key Features:**
- 15× faster than LLM-based baselines (20s for 4-minute music on A100)
- Superior musical coherence across melody, harmony, and rhythm
- full-song generation, duration control and accepts natural language descriptions

## Uses

### Direct Use
ACE-Step can be used for:
- Generating original music from text descriptions
- Music remixing and style transfer
- edit song lyrics

### Downstream Use
The model serves as a foundation for:
- Voice cloning applications
- Specialized music generation (rap, jazz, etc.)
- Music production tools
- Creative AI assistants

### Out-of-Scope Use
The model should not be used for:
- Generating copyrighted content without permission
- Creating harmful or offensive content
- Misrepresenting AI-generated music as human-created

## How to Get Started

see: https://github.com/ace-step/ACE-Step

## Hardware Performance

| Device        | 27 Steps | 60 Steps |
|---------------|----------|----------|
| NVIDIA A100   | 27.27x   | 12.27x   |
| RTX 4090      | 34.48x   | 15.63x   |
| RTX 3090      | 12.76x   | 6.48x    |
| M2 Max        | 2.27x    | 1.03x    |

*RTF (Real-Time Factor) shown - higher values indicate faster generation*


## Limitations

- Performance varies by language (top 10 languages perform best)
- Longer generations (>5 minutes) may lose structural coherence
- Rare instruments may not render perfectly
- Output Inconsistency: Highly sensitive to random seeds and input duration, leading to varied "gacha-style" results.
- Style-specific Weaknesses: Underperforms on certain genres (e.g. Chinese rap/zh_rap) Limited style adherence and musicality ceiling
- Continuity Artifacts: Unnatural transitions in repainting/extend operations
- Vocal Quality: Coarse vocal synthesis lacking nuance
- Control Granularity: Needs finer-grained musical parameter control

## Ethical Considerations

Users should:
- Verify originality of generated works
- Disclose AI involvement
- Respect cultural elements and copyrights
- Avoid harmful content generation


## Model Details

**Developed by:** ACE Studio and StepFun  
**Model type:** Diffusion-based music generation with transformer conditioning  
**License:** Apache 2.0  
**Resources:**  
- [Project Page](https://ace-step.github.io/)
- [Demo Space](https://huggingface.co/spaces/ACE-Step/ACE-Step)
- [GitHub Repository](https://github.com/ACE-Step/ACE-Step)


## Citation

```bibtex
@misc{gong2025acestep,
  title={ACE-Step: A Step Towards Music Generation Foundation Model},
  author={Junmin Gong, Wenxiao Zhao, Sen Wang, Shengyuan Xu, Jing Guo}, 
  howpublished={\url{https://github.com/ace-step/ACE-Step}},
  year={2025},
  note={GitHub repository}
}
```

## Acknowledgements
This project is co-led by ACE Studio and StepFun.