Safetensors
qwen2
Chong Zhang commited on
Commit
a9eaab3
·
verified ·
1 Parent(s): de6791d

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +303 -1
README.md CHANGED
@@ -5,4 +5,306 @@ language:
5
  pipeline_tag: text-to-audio
6
  tags:
7
  - music_generation
8
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  pipeline_tag: text-to-audio
6
  tags:
7
  - music_generation
8
+ ---
9
+
10
+ [//]: # (# InspireMusic)
11
+ <p align="center">
12
+ <a href="https://github.com/FunAudioLLM/InspireMusic" target="_blank">
13
+ <img alt="logo" src="./asset/logo.png" width="100%"></a>
14
+ </p>
15
+
16
+ [//]: # (<p align="center">)
17
+
18
+ [//]: # ( <a href="https://github.com/FunAudioLLM/InspireMusic" target="_blank">)
19
+
20
+ [//]: # ( <img alt="InspireMusic" src="https://svg-banners.vercel.app/api?type=origin&text1=Inspire%20Music🎶&text2=🤗%20A%20Fundamental%20Music%20Song%20Audio%20Generation%20Toolkit&width=800&height=210"></a>)
21
+
22
+ [//]: # (</p>)
23
+
24
+ <p align="center">
25
+ <a href="https://iris2c.github.io/InspireMusic" target="_blank">
26
+ <img alt="Demo" src="https://img.shields.io/badge/Demo%20👈🏻-InspireMusic?labelColor=%20%23FDB062&label=InspireMusic&color=%20%23f79009"></a>
27
+ <a href="https://github.com/FunAudioLLM/InspireMusic" target="_blank">
28
+ <img alt="Code" src="https://img.shields.io/badge/Code%20⭐-InspireMusic?labelColor=%20%237372EB&label=InspireMusic&color=%20%235462eb"></a>
29
+
30
+ <a href="https://modelscope.cn/models/iic/InspireMusic-1.5B-Long" target="_blank">
31
+ <img alt="Model" src="https://img.shields.io/badge/InspireMusic-Model-green"></a>
32
+
33
+ <a href="https://arxiv.org/abs/" target="_blank">
34
+ <img alt="Paper" src="https://img.shields.io/badge/arXiv-Paper-lightgrey"></a>
35
+ <a href="https://github.com/FunAudioLLM/InspireMusic" target="_blank">
36
+
37
+ [//]: # (<a href="https://huggingface.co/FunAudioLLM/InspireMusic-Base" target="_blank">)
38
+
39
+ [//]: # ( <img alt="Model" src="https://img.shields.io/badge/Model-InspireMusic?labelColor=%20%23FDA199&label=InspireMusic&color=orange"></a>)
40
+
41
+ [//]: # (<a href="https://arxiv.org/abs/" target="_blank">)
42
+
43
+ [//]: # ( <img alt="Paper" src="https://img.shields.io/badge/Paper-arXiv?labelColor=%20%23528bff&label=arXiv&color=%20%23155EEF"></a>)
44
+
45
+ [//]: # (<a href="https://github.com/FunAudioLLM/InspireMusic" target="_blank">)
46
+
47
+ [//]: # ( <img alt="Githube Star" src="https://img.shields.io/github/stars/FunAudioLLM/InspireMusic"></a>)
48
+
49
+ [//]: # (<a href="https://github.com/FunAudioLLM/InspireMusic/blob/main/asset/QR.jpg" target="_blank">)
50
+
51
+ [//]: # ( <img src="https://img.shields.io/badge/group%20chat-group?&labelColor=%20%235462eb&color=%20%235462eb" alt="chat on WeChat"></a>)
52
+ [//]: # (<a href="https://discord.gg/nSPpRU7fRr" target="_blank">)
53
+
54
+ [//]: # ( <img src="https://img.shields.io/badge/discord-chat?&labelColor=%20%235462eb&color=%20%235462eb" alt="chat on Discord"></a>)
55
+
56
+ [//]: # ( <a href="https://github.com/FunAudioLLM/InspireMusic" target="_blank">)
57
+
58
+ [//]: # ( <img alt="Static Badge" src="https://img.shields.io/badge/v0.1-version?logo=free&color=%20%23155EEF&label=version&labelColor=%20%23528bff"></a>)
59
+ [//]: # (<a href="https://github.com/FunAudioLLM/InspireMusic/graphs/commit-activity" target="_blank">)
60
+
61
+ [//]: # (<img alt="Commits last month" src="https://img.shields.io/github/commit-activity/m/FunAudioLLM/InspireMusic?labelColor=%20%2332b583&color=%20%2312b76a"></a>)
62
+
63
+ [//]: # ( <a href="https://github.com/FunAudioLLM/InspireMusic" target="_blank">)
64
+
65
+ [//]: # ( <img alt="Issues closed" src="https://img.shields.io/github/issues-search?query=repo%3AFunAudioLLM%2FInspireMusic%20is%3Aclosed&label=issues%20closed&labelColor=%20%237d89b0&color=%20%235d6b98"></a>)
66
+
67
+ [//]: # ( <a href="https://github.com/FunAudioLLM/InspireMusic/discussions/" target="_blank">)
68
+
69
+ [//]: # ( <img alt="Discussion posts" src="https://img.shields.io/github/discussions/FunAudioLLM/InspireMusic?labelColor=%20%239b8afb&color=%20%237a5af8"></a>)
70
+ </p>
71
+
72
+ InspireMusic is a fundamental AIGC toolkit designed for music, song, and audio generation using the PyTorch library.
73
+
74
+ ![GitHub Repo stars](https://img.shields.io/github/stars/FunAudioLLM/InspireMusic) Please support our community project 💖 by starring it on GitHub 加⭐支持 🙏
75
+
76
+ ---
77
+ <a name="Highligts"></a>
78
+ ## Highlights
79
+ **InspireMusic** focuses on music generation, song generation and audio generation.
80
+ - A unified framework for music/song/audio generation. Controllable with text prompts, music genres, music structures, etc.
81
+ - Support text-to-music, music continuation, audio super-resolution, audio reconstruction tasks with high audio quality, with available sampling rates of 24kHz, 48kHz.
82
+ - Support long audio generation.
83
+ - Convenient fine-tuning and inference. Support mixed precision training (FP16, FP32). Provide convenient fine-tuning and inference scripts and strategies, allowing users to easily their music generation models.
84
+
85
+ <a name="What's News"></a>
86
+ ## What's New 🔥
87
+
88
+ - 2025/01: Open-source [InspireMusic-Base](https://modelscope.cn/models/iic/InspireMusic/summary), [InspireMusic-Base-24kHz](https://modelscope.cn/models/iic/InspireMusic-Base-24kHz/summary), [InspireMusic-1.5B](https://modelscope.cn/models/iic/InspireMusic-1.5B/summary), [InspireMusic-1.5B-24kHz](https://modelscope.cn/models/iic/InspireMusic-1.5B-24kHz/summary), [InspireMusic-1.5B-Long](https://modelscope.cn/models/iic/InspireMusic-1.5B-Long/summary) models for music generation.
89
+ - 2024/12: Support to generate 48kHz audio with super resolution flow matching.
90
+ - 2024/11: Welcome to preview 👉🏻 [**InspireMusic Demos**](https://iris2c.github.io/InspireMusic) 👈🏻. We're excited to share this with you and are working hard to bring even more features and models soon. Your support and feedback mean a lot to us!
91
+ - 2024/11: We are thrilled to announce the open-sourcing of the **InspireMusic** [code repository](https://github.com/FunAudioLLM/InspireMusic) and [demos](https://iris2c.github.io/InspireMusic). **InspireMusic** is a unified framework for music, song, and audio generation, featuring capabilities such as text-to-music conversion, music structure, genre control, and timestamp management. InspireMusic stands out for its exceptional music generation and instruction-following abilities.
92
+
93
+ ## Introduction
94
+ > [!Note]
95
+ > This repo contains the algorithm infrastructure and some simple examples.
96
+
97
+ > [!Tip]
98
+ > To explore the performance, please refer to [InspireMusic Demo Page](https://iris2c.github.io/InspireMusic). We will open-source InspireMusic models and HuggingFace Space soon.
99
+
100
+ InspireMusic is a unified music, song and audio generation framework through the audio tokenization and detokenization process integrated with a large autoregressive transformer. The original motive of this toolkit is to empower the common users to innovate soundscapes and enhance euphony in research through music, song, and audio crafting. The toolkit provides both inference and training code for AI generative models that create high-quality music. Featuring a unified framework, InspireMusic incorporates autoregressive Transformer and conditional flow-matching modeling (CFM), allowing for the controllable generation of music, songs, and audio with both textual and structural music conditioning, as well as neural audio tokenizers. Currently, the toolkit supports text-to-music generation and plans to expand its capabilities to include text-to-song and text-to-audio generation in the future.
101
+
102
+ ## Installation
103
+
104
+ ### Clone
105
+
106
+ - Clone the repo
107
+ ``` sh
108
+ git clone --recursive https://github.com/FunAudioLLM/InspireMusic.git
109
+ # If you failed to clone submodule due to network failures, please run the following command until success
110
+ cd InspireMusic
111
+ git submodule update --init --recursive
112
+ ```
113
+
114
+ ### Install
115
+ InspireMusic requires Python 3.8, PyTorch 2.1.0. To install InspireMusic, you can run one of the following:
116
+
117
+ - Install Conda: please see https://docs.conda.io/en/latest/miniconda.html
118
+ - Create Conda env:
119
+ ``` sh
120
+ conda create -n inspiremusic python=3.8
121
+ conda activate inspiremusic
122
+ cd InspireMusic
123
+ # pynini is required by WeTextProcessing, use conda to install it as it can be executed on all platforms.
124
+ conda install -y -c conda-forge pynini==2.1.5
125
+ pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com
126
+ # install flash attention to speedup training
127
+ pip install flash-attn --no-build-isolation
128
+ ```
129
+
130
+ - Install within the package:
131
+ ```sh
132
+ cd InspireMusic
133
+ # You can run to install the packages
134
+ python setup.py install
135
+ pip install flash-attn --no-build-isolation
136
+ ```
137
+
138
+ We also recommend having `sox` or `ffmpeg` installed, either through your system or Anaconda:
139
+ ```sh
140
+ # # Install sox
141
+ # ubuntu
142
+ sudo apt-get install sox libsox-dev
143
+ # centos
144
+ sudo yum install sox sox-devel
145
+
146
+ # Install ffmpeg
147
+ # ubuntu
148
+ sudo apt-get install ffmpeg
149
+ # centos
150
+ sudo yum install ffmpeg
151
+ ```
152
+
153
+ ## Models
154
+ ### Download Model
155
+
156
+ We strongly recommend that you download our pretrained `InspireMusic model`.
157
+
158
+ If you are an expert in this field, and you are only interested in training your own InspireMusic model from scratch, you can skip this step.
159
+
160
+ ``` sh
161
+ # git模型下载,请确保已安装git lfs
162
+ mkdir -p pretrained_models
163
+ git clone https://www.modelscope.cn/iic/InspireMusic-1.5B-Long.git pretrained_models/InspireMusic
164
+ ```
165
+
166
+ ### Available Models
167
+ Currently, we open source the music generation models support 24KHz mono and 48KHz stereo audio.
168
+ The table below presents the links to the ModelScope and Huggingface model hub. More models will be available soon.
169
+
170
+ | Model name | Model Links | Remarks |
171
+ |-------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------|
172
+ | InspireMusic-Base-24kHz | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic-Base-24kHz/summary) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-Base-24kHz) | Pre-trained Music Generation Model, 24kHz mono |
173
+ | InspireMusic-Base | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic-Base/summary) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-Base) | Pre-trained Music Generation Model, 48kHz |
174
+ | InspireMusic-1.5B-24kHz | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic-1.5B-24kHz/summary) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-1.5B-24kHz) | Pre-trained Music Generation 1.5B Model, 24kHz mono |
175
+ | InspireMusic-1.5B | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic-1.5B/summary) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-1.5B) | Pre-trained Music Generation 1.5B Model, 48kHz |
176
+ | InspireMusic-1.5B-Long | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic-1.5B-Long/summary) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-1.5B-Long) | Pre-trained Music Generation 1.5B Model, 48kHz, support long audio |
177
+ | InspireSong-1.5B | [![model](https://img.shields.io/badge/ModelScope-Model-lightgrey.svg)]() [![model](https://img.shields.io/badge/HuggingFace-Model-lightgrey.svg)]() | Pre-trained Song Generation 1.5B Model, 48kHz stereo |
178
+ | InspireAudio-1.5B | [![model](https://img.shields.io/badge/ModelScope-Model-lightgrey.svg)]() [![model](https://img.shields.io/badge/HuggingFace-Model-lightgrey.svg)]() | Pre-trained Audio Generation 1.5B Model, 48kHz stereo |
179
+
180
+ ## Basic Usage
181
+
182
+ At the moment, InspireMusic contains the training code and inference code for [music generation](https://github.com/FunAudioLLM/InspireMusic/tree/main/examples/music_generation). More tasks such as song generation and audio generation will be supported in future.
183
+
184
+ ### Quick Start
185
+
186
+ Here is a quick start running script to do music generation task including data preparation pipeline, model training, inference.
187
+ ``` sh
188
+ cd InspireMusic/examples/music_generation/
189
+ bash run.sh
190
+ ```
191
+
192
+ ### Training
193
+
194
+ Here is an example to train LLM model, support FP16 training.
195
+ ```sh
196
+ torchrun --nnodes=1 --nproc_per_node=8 \
197
+ --rdzv_id=1024 --rdzv_backend="c10d" --rdzv_endpoint="localhost:0" \
198
+ inspiremusic/bin/train.py \
199
+ --train_engine "torch_ddp" \
200
+ --config conf/inspiremusic.yaml \
201
+ --train_data data/train.data.list \
202
+ --cv_data data/dev.data.list \
203
+ --model llm \
204
+ --model_dir `pwd`/exp/music_generation/llm/ \
205
+ --tensorboard_dir `pwd`/tensorboard/music_generation/llm/ \
206
+ --ddp.dist_backend "nccl" \
207
+ --num_workers 8 \
208
+ --prefetch 100 \
209
+ --pin_memory \
210
+ --deepspeed_config ./conf/ds_stage2.json \
211
+ --deepspeed.save_states model+optimizer \
212
+ --fp16
213
+ ```
214
+
215
+ Here is an example code to train flow matching model, does not support FP16 training.
216
+ ```sh
217
+ torchrun --nnodes=1 --nproc_per_node=8 \
218
+ --rdzv_id=1024 --rdzv_backend="c10d" --rdzv_endpoint="localhost:0" \
219
+ inspiremusic/bin/train.py \
220
+ --train_engine "torch_ddp" \
221
+ --config conf/inspiremusic.yaml \
222
+ --train_data data/train.data.list \
223
+ --cv_data data/dev.data.list \
224
+ --model flow \
225
+ --model_dir `pwd`/exp/music_generation/flow/ \
226
+ --tensorboard_dir `pwd`/tensorboard/music_generation/flow/ \
227
+ --ddp.dist_backend "nccl" \
228
+ --num_workers 8 \
229
+ --prefetch 100 \
230
+ --pin_memory \
231
+ --deepspeed_config ./conf/ds_stage2.json \
232
+ --deepspeed.save_states model+optimizer
233
+ ```
234
+
235
+ ### Inference
236
+
237
+ Here is an example script to quickly do model inference.
238
+ ``` sh
239
+ cd InspireMusic/examples/music_generation/
240
+ bash infer.sh
241
+ ```
242
+
243
+ Here is an example code to run inference with normal mode, i.e., with flow matching model for text-to-music and music continuation tasks.
244
+ ```sh
245
+ pretrained_model_dir = "./pretrained_models/InspireMusic/"
246
+ for task in 'text-to-music' 'continuation'; do
247
+ python inspiremusic/bin/inference.py --task $task \
248
+ --gpu 0 \
249
+ --config conf/inspiremusic.yaml \
250
+ --prompt_data data/test/parquet/data.list \
251
+ --flow_model $pretrained_model_dir/flow.pt \
252
+ --llm_model $pretrained_model_dir/llm.pt \
253
+ --music_tokenizer $pretrained_model_dir/music_tokenizer \
254
+ --wavtokenizer $pretrained_model_dir/wavtokenizer \
255
+ --result_dir `pwd`/exp/inspiremusic/${task}_test \
256
+ --chorus verse \
257
+ --min_generate_audio_seconds 8 \
258
+ --max_generate_audio_seconds 30
259
+ done
260
+ ```
261
+
262
+ Here is an example code to run inference with fast mode, i.e., without flow matching model for text-to-music and music continuation tasks.
263
+ ```sh
264
+ pretrained_model_dir = "./pretrained_models/InspireMusic/"
265
+ for task in 'text-to-music' 'continuation'; do
266
+ python inspiremusic/bin/inference.py --task $task \
267
+ --gpu 0 \
268
+ --config conf/inspiremusic.yaml \
269
+ --prompt_data data/test/parquet/data.list \
270
+ --flow_model $pretrained_model_dir/flow.pt \
271
+ --llm_model $pretrained_model_dir/llm.pt \
272
+ --music_tokenizer $pretrained_model_dir/music_tokenizer \
273
+ --wavtokenizer $pretrained_model_dir/wavtokenizer \
274
+ --result_dir `pwd`/exp/inspiremusic/${task}_test \
275
+ --chorus verse \
276
+ --fast \
277
+ --min_generate_audio_seconds 8 \
278
+ --max_generate_audio_seconds 30
279
+ done
280
+ ```
281
+
282
+ ## Roadmap
283
+
284
+ - [x] 2024/12
285
+ - [x] 75Hz InspireMusic-Base model for music generation
286
+
287
+ - [x] 2025/01
288
+ - [x] Support to generate 48kHz
289
+ - [x] 75Hz InspireMusic-1.5B model for music generation
290
+ - [x] 75Hz InspireMusic-1.5B-Long model for long-form music generation
291
+
292
+ - [ ] 2025/02
293
+ - [ ] Support song generation task
294
+ - [ ] 75Hz InspireSong model for song generation
295
+
296
+ - [ ] 2025/03
297
+ - [ ] Support audio generation task
298
+ - [ ] 75Hz InspireAudio model for music and audio generation
299
+
300
+ - [ ] TBD
301
+
302
+ - [ ] 25Hz InspireMusic model
303
+ - [ ] Support 48kHz stereo audio
304
+ - [ ] Streaming inference mode support
305
+ - [ ] Support more instruction mode, multi-lingual instructions
306
+ - [ ] InspireSong trained with more multi-lingual data
307
+ - [ ] More...
308
+
309
+ ## Disclaimer
310
+ The content provided above is for academic purposes only and is intended to demonstrate technical capabilities. Some examples are sourced from the internet. If any content infringes on your rights, please contact us to request its removal.