Update README.md
Browse files
README.md
CHANGED
@@ -17,7 +17,7 @@ Ke-Omni-R is an advanced audio reasoning model built upon [Qwen2.5-Omni-7B](http
|
|
17 |
- **KL Divergence**: Slight improvements were observed during GRPO training by leveraging KL divergence.
|
18 |
- **Domain Ratio vs. Data Volume**: Domain diversity outweighs data volume. We utilized only 10k samples, with 5k randomly selected from AVQA and another 5k from MusicBench.
|
19 |
|
20 |
-
## Performance: Accuracies (%) on MMAU Test-mini and Test benchmark
|
21 |
| Model | Method | Sound (Test-mini) | Sound (Test) | Music (Test-mini) | Music (Test) | Speech (Test-mini) | Speech (Test) | Average (Test-mini) | Average (Test) |
|
22 |
|---------------------------------------|-----------------------|-----------|-------|-----------|-------|-----------|------|------------|-------|
|
23 |
| - | Human\* | 86.31 | - | 78.22 | - | 82.17 | - | 82.23 | - |
|
@@ -34,6 +34,14 @@ Ke-Omni-R is an advanced audio reasoning model built upon [Qwen2.5-Omni-7B](http
|
|
34 |
| Qwen2.5-Omni-7B | \[4\] | 67.87 | - | 69.16 | - | 59.76 | - | 65.60 | - |
|
35 |
| Ke-Omni-R(Qwen2.5-Omni-7B) | GRPO(ours) | 69.37 | **71.90** | 69.46 | 67.13 |**67.87** | 67.10 | **68.90** |**68.71** |
|
36 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
37 |
Note:
|
38 |
|
39 |
- \* The data are sourced from the [MMAU leaderboard](https://sakshi113.github.io/mmau_homepage/#leaderboard).
|
|
|
17 |
- **KL Divergence**: Slight improvements were observed during GRPO training by leveraging KL divergence.
|
18 |
- **Domain Ratio vs. Data Volume**: Domain diversity outweighs data volume. We utilized only 10k samples, with 5k randomly selected from AVQA and another 5k from MusicBench.
|
19 |
|
20 |
+
## Performance: Accuracies (%)↑ on MMAU Test-mini and Test benchmark
|
21 |
| Model | Method | Sound (Test-mini) | Sound (Test) | Music (Test-mini) | Music (Test) | Speech (Test-mini) | Speech (Test) | Average (Test-mini) | Average (Test) |
|
22 |
|---------------------------------------|-----------------------|-----------|-------|-----------|-------|-----------|------|------------|-------|
|
23 |
| - | Human\* | 86.31 | - | 78.22 | - | 82.17 | - | 82.23 | - |
|
|
|
34 |
| Qwen2.5-Omni-7B | \[4\] | 67.87 | - | 69.16 | - | 59.76 | - | 65.60 | - |
|
35 |
| Ke-Omni-R(Qwen2.5-Omni-7B) | GRPO(ours) | 69.37 | **71.90** | 69.46 | 67.13 |**67.87** | 67.10 | **68.90** |**68.71** |
|
36 |
|
37 |
+
## Performance: CER/WER (%)↓ on ASR benchmark
|
38 |
+
| Model | Method | WenetSpeech test-net | WenetSpeech test-meeting | LibriSpeech test-clean | LibriSpeech test-other|
|
39 |
+
| ---|----| ----| ----| ---- | ----|
|
40 |
+
| Qwen2.5-Omni-3B | \[4\] | 6.3 | 8.1 | 2.2 | 4.5 |
|
41 |
+
| Qwen2.5-Omni-7B | \[4\] | 5.9 | 7.7 | 1.8 | 3.4 |
|
42 |
+
| Ke-Omni-3B | ours | 11.7 | 16.1 | 1.8 | 3.8 |
|
43 |
+
| Ke-Omni-7B | ours | 7.5 | 9.8 | **1.6** | **3.1** |
|
44 |
+
|
45 |
Note:
|
46 |
|
47 |
- \* The data are sourced from the [MMAU leaderboard](https://sakshi113.github.io/mmau_homepage/#leaderboard).
|