athi180202 commited on
Commit
18499a2
·
verified ·
1 Parent(s): e6d6201

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -1
README.md CHANGED
@@ -8,4 +8,73 @@ tags:
8
  - audio
9
  - music-generation
10
  - peft
11
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  - audio
9
  - music-generation
10
  - peft
11
+ ---
12
+
13
+ ### Exploring Adapter Design Tradeoffs for Low Resource Music Generation
14
+ [Code](https://github.com/atharva20038/ACMMM_Adapters/edit/main) | [Models](https://huggingface.co/collections/athi180202/peft-adaptations-of-music-generation-models-684ba077a2a44999bb6cb175) | [Paper](https://arxiv.org/abs/2506.21298)
15
+
16
+ This repository contains our code for the paper: "Exploring Adapter Design Tradeoffs for Low Resource Music Generation"
17
+
18
+ Fine-tuning large-scale music generation models, such as MusicGen and Mustango, is a computationally expensive process, often requiring updates to billions of parameters and, therefore, significant hardware resources.
19
+ Parameter-Efficient Fine-Tuning (PEFT) techniques, particularly adapter-based methods, have emerged as a promising alternative, enabling adaptation with minimal trainable parameters while preserving model performance.
20
+ However, the design choices for adapters, including their architecture, placement, and size, are numerous, and it is unclear which of these combinations would produce optimal adapters and why, for a given case of low-resource music genre.
21
+ In this paper, we attempt to answer this question by studying various adapter configurations for two AI music models, MusicGen and Mustango, on two genres: Hindustani Classical and Turkish Makam music.
22
+
23
+ ## Datasets
24
+
25
+ The [Compmusic - Turkish Makam](https://compmusic.upf.edu/datasets) dataset contains 405 hours of Turkish Makam and Hindustani Classical data.
26
+
27
+ The [Compmusic - Hindustani Classical](https://compmusic.upf.edu/datasets) dataset contains 305 hours of Hindustani Classical annotated data.
28
+
29
+ The Hindustani Classical dataset includes 21 different instrument types, such as the Pakhavaj, Zither, Sarangi, Ghatam, Harmonium,
30
+ and Santoor, along with vocals.
31
+
32
+ The Turkish Makam dataset features 42 makam-specific instruments, such as Oud, Tanbur, Ney, Davul, Clarinet, Kös, Kudüm,
33
+ Yaylı Tanbur, Tef, Kanun, Zurna, Bendir, Darbuka, Classical Kemençe, Rebab, Çevgen, and vocals. It encompasses 100 different
34
+ makams and 62 distinct usuls.
35
+
36
+ ## Adapter Positioning
37
+
38
+ <div align="center">
39
+ <img src="img/Architecture-1.png" width="900"/>
40
+ </div>
41
+
42
+ ### Mustango
43
+ To enhance this process, a Bottleneck Residual Adapter with convolution layers is integrated into the up-sampling, middle, and down-sampling blocks of the UNet, positioned just after the cross-attention block. This design facilitates cultural adaptation while preserving computational efficiency. The adapters reduce channel dimensions by a factor of 8, using a kernel size of 1 and GeLU activation after the down-projection layers to introduce non-linearity.
44
+
45
+ ### MusicGen
46
+ In MusicGen, we enhance the model with an additional 2 million parameters by integrating Linear Bottleneck Residual Adapter after the transformer decoder within the MusicGen architecture after thorough experimentation with other placements.
47
+
48
+ The total parameter count of both the models is ~2 billion, making the adapter only 0.1% of the total size (2M params).
49
+ For both models, we used two RTX A6000 GPUs over a period of around 10 hours. The adapter block was fine-tuned, using the AdamW optimizer using MSE (Reconstruction Loss).
50
+
51
+ ## Evaluations
52
+ ### **Objective Evaluation Metrics for Music Models**
53
+ <div align="center">
54
+ <img src="img/fad_fd_image-1.png" width="900"/>
55
+ </div>
56
+
57
+ For Mustango, the objective evaluation results can also be seen in the following google sheet : [Spreadsheet](https://docs.google.com/spreadsheets/d/11aHVjt8zeHyMqmIBIdV5b4pvlu8gc83510HD0nwBrjo/edit?gid=0#gid=0).
58
+
59
+ ### **Human Evaluation**
60
+ Hindustani Classical - Subjective Evaluation Results
61
+ <div align="center">
62
+ <img src="img/hindustani_quality (1).png" width="900"/>
63
+ </div>
64
+
65
+ Turkish Makam = Subjective Evaluation Results
66
+ <div align="center">
67
+ <img src="img/makam (1).png" width="900"/>
68
+ </div>
69
+
70
+
71
+ ## Citation
72
+ Please consider citing the following article if you found our work useful:
73
+ ```
74
+
75
+
76
+ @misc
77
+ {
78
+
79
+ }
80
+ ```