kcz358 commited on
Commit
6c39586
·
verified ·
1 Parent(s): 36a6aad

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -2
README.md CHANGED
@@ -145,5 +145,3 @@ We present the contributions of our data mixture here. Our SFT data mixture incl
145
  *The hours of some training datasets are estimated and may not be fully accurate
146
  <br>
147
  One of the key strengths of our training recipe lies in the quality and quantity of our data. Our training dataset consists of approximately 5 billion tokens, corresponding to around 50,000 hours of audio. Compared to models such as Qwen-Omni and Phi-4, our dataset is over 100 times smaller, yet our model achieves competitive performance. All data is sourced from publicly available open-source datasets, highlighting the sample efficiency of our training approach. A detailed breakdown of our data distribution is provided below, along with comparisons to other models.
148
-
149
- [More Information Needed]
 
145
  *The hours of some training datasets are estimated and may not be fully accurate
146
  <br>
147
  One of the key strengths of our training recipe lies in the quality and quantity of our data. Our training dataset consists of approximately 5 billion tokens, corresponding to around 50,000 hours of audio. Compared to models such as Qwen-Omni and Phi-4, our dataset is over 100 times smaller, yet our model achieves competitive performance. All data is sourced from publicly available open-source datasets, highlighting the sample efficiency of our training approach. A detailed breakdown of our data distribution is provided below, along with comparisons to other models.