Safetensors
qwen2_5_omni
shuaijiang commited on
Commit
a8c6518
·
verified ·
1 Parent(s): 637ce4f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -1
README.md CHANGED
@@ -17,7 +17,7 @@ Ke-Omni-R is an advanced audio reasoning model built upon [Qwen2.5-Omni-7B](http
17
  - **KL Divergence**: Slight improvements were observed during GRPO training by leveraging KL divergence.
18
  - **Domain Ratio vs. Data Volume**: Domain diversity outweighs data volume. We utilized only 10k samples, with 5k randomly selected from AVQA and another 5k from MusicBench.
19
 
20
- ## Performance: Accuracies (%) on MMAU Test-mini and Test benchmark
21
  | Model | Method | Sound (Test-mini) | Sound (Test) | Music (Test-mini) | Music (Test) | Speech (Test-mini) | Speech (Test) | Average (Test-mini) | Average (Test) |
22
  |---------------------------------------|-----------------------|-----------|-------|-----------|-------|-----------|------|------------|-------|
23
  | - | Human\* | 86.31 | - | 78.22 | - | 82.17 | - | 82.23 | - |
@@ -34,6 +34,14 @@ Ke-Omni-R is an advanced audio reasoning model built upon [Qwen2.5-Omni-7B](http
34
  | Qwen2.5-Omni-7B | \[4\] | 67.87 | - | 69.16 | - | 59.76 | - | 65.60 | - |
35
  | Ke-Omni-R(Qwen2.5-Omni-7B) | GRPO(ours) | 69.37 | **71.90** | 69.46 | 67.13 |**67.87** | 67.10 | **68.90** |**68.71** |
36
 
 
 
 
 
 
 
 
 
37
  Note:
38
 
39
  - \* The data are sourced from the [MMAU leaderboard](https://sakshi113.github.io/mmau_homepage/#leaderboard).
 
17
  - **KL Divergence**: Slight improvements were observed during GRPO training by leveraging KL divergence.
18
  - **Domain Ratio vs. Data Volume**: Domain diversity outweighs data volume. We utilized only 10k samples, with 5k randomly selected from AVQA and another 5k from MusicBench.
19
 
20
+ ## Performance: Accuracies (%) on MMAU Test-mini and Test benchmark
21
  | Model | Method | Sound (Test-mini) | Sound (Test) | Music (Test-mini) | Music (Test) | Speech (Test-mini) | Speech (Test) | Average (Test-mini) | Average (Test) |
22
  |---------------------------------------|-----------------------|-----------|-------|-----------|-------|-----------|------|------------|-------|
23
  | - | Human\* | 86.31 | - | 78.22 | - | 82.17 | - | 82.23 | - |
 
34
  | Qwen2.5-Omni-7B | \[4\] | 67.87 | - | 69.16 | - | 59.76 | - | 65.60 | - |
35
  | Ke-Omni-R(Qwen2.5-Omni-7B) | GRPO(ours) | 69.37 | **71.90** | 69.46 | 67.13 |**67.87** | 67.10 | **68.90** |**68.71** |
36
 
37
+ ## Performance: CER/WER (%)↓ on ASR benchmark
38
+ | Model | Method | WenetSpeech test-net | WenetSpeech test-meeting | LibriSpeech test-clean | LibriSpeech test-other|
39
+ | ---|----| ----| ----| ---- | ----|
40
+ | Qwen2.5-Omni-3B | \[4\] | 6.3 | 8.1 | 2.2 | 4.5 |
41
+ | Qwen2.5-Omni-7B | \[4\] | 5.9 | 7.7 | 1.8 | 3.4 |
42
+ | Ke-Omni-3B | ours | 11.7 | 16.1 | 1.8 | 3.8 |
43
+ | Ke-Omni-7B | ours | 7.5 | 9.8 | **1.6** | **3.1** |
44
+
45
  Note:
46
 
47
  - \* The data are sourced from the [MMAU leaderboard](https://sakshi113.github.io/mmau_homepage/#leaderboard).