Add pipeline tag, link to paper

This PR ensures the model can be found at https://huggingface.co/models?pipeline_tag=any-to-any&sort=trending and adds a link to the corresponding paper.

Files changed (1) hide show

README.md +31 -205

README.md CHANGED Viewed

@@ -1,6 +1,8 @@
 ---
 license: apache-2.0
 ---
 <div align="center">
 <img src="https://github.com/baichuan-inc/Baichuan-Omni-1.5/raw/main/assets/logo.png" width="300em" ></img>
@@ -13,7 +15,7 @@ license: apache-2.0
 <p align="center">
-  Baichuan-Omni-1.5 <a href="https://huggingface.co/baichuan-inc/Baichuan-Omni-1d5">🤗</a> | Baichuan-Omni-1.5-Base <a href="https://huggingface.co/baichuan-inc/Baichuan-Omni-1d5-Base">🤗</a>  |Github <a href="https://github.com/baichuan-inc/Baichuan-Omni-1.5/">📖 </a> | Report <a href="https://github.com/baichuan-inc/Baichuan-Omni-1.5/raw/main/baichuan_omni_1_5.pdf">📖</a>
 </p>
 </p>
   <p align="center">
@@ -232,9 +234,9 @@ We sugguest readers to refer to our [**Github**](https://github.com/baichuan-inc
 <details>
-<summary>click to view</summary>
-#### Image Understanding
 <div align="center">
   <table style="margin: 0 auto; text-align: center;">
@@ -247,11 +249,11 @@ We sugguest readers to refer to our [**Github**](https://github.com/baichuan-inc
       <tr>
         <td>Model</td>
         <td>Size</td>
-        <td>MMBench-EN (Acc.)</td>
-        <td>MMbench-CN (Acc.)</td>
-        <td>SEED-IMG (Acc.)</td>
-        <td>MMMU-val (Acc.)</td>
-        <td>HallusionBench (Acc.)</td>
       </tr>
       <tr>
         <td colspan="9">Proprietary Models</td>
@@ -361,11 +363,11 @@ We sugguest readers to refer to our [**Github**](https://github.com/baichuan-inc
       <tr>
         <td>Model</td>
         <td>Size</td>
-        <td>RealWorldQA (Acc.)</td>
-        <td>MathVista-mini (Acc.)</td>
-        <td>TextVQA-val (Acc.)</td>
-        <td>ChartQA (Acc.)</td>
-        <td>OCRBench (Acc.)</td>
       </tr>
       <tr>
         <td colspan="8">Proprietary Models</td>
@@ -466,9 +468,9 @@ We sugguest readers to refer to our [**Github**](https://github.com/baichuan-inc
 <details>
-<summary>click to view</summary>
-#### Video Understanding
 <div align="center">
   <table style="margin: 0 auto; text-align: center;">
     <thead>
@@ -481,10 +483,10 @@ We sugguest readers to refer to our [**Github**](https://github.com/baichuan-inc
         <td>Model</td>
         <td>Size</td>
         <td># Frames</td>
-        <td>MVBench (Acc.)</td>
-        <td>Egoschema (Acc.)</td>
-        <td>VideoMME (Acc.)</td>
-        <td>Perception-Test (Acc.)</td>
       </tr>
       <tr>
         <td colspan="7">Proprietary Models</td>
@@ -550,7 +552,7 @@ We sugguest readers to refer to our [**Github**](https://github.com/baichuan-inc
         <td>VideoLLaMA 2</td>
         <td>7B</td>
         <td>16</td>
-        <td>54.6*</td>
         <td>51.7*</td>
         <td>46.6*</td>
         <td>51.4*</td>
@@ -606,7 +608,7 @@ We sugguest readers to refer to our [**Github**](https://github.com/baichuan-inc
       <tr>
         <td>Baichuan-Omni</td>
         <td>7B</td>
-        <td>1 fps (max 32)</td>
         <td>60.9</td>
         <td>58.8</td>
         <td>58.2</td>
@@ -634,6 +636,7 @@ We sugguest readers to refer to our [**Github**](https://github.com/baichuan-inc
   </table>
 </div>
 <br>
 <div align="center">
@@ -798,12 +801,11 @@ We sugguest readers to refer to our [**Github**](https://github.com/baichuan-inc
 </details>
 <details>
-<summary>click to view</summary>
-#### Audio Comprehensive and Speech Generation
 <div align="center">
   <table style="margin: 0 auto; text-align: center;">
   <thead>
@@ -914,17 +916,13 @@ We sugguest readers to refer to our [**Github**](https://github.com/baichuan-inc
   </tbody>
   </table>
 </div>
 </details>
 <details>
-<summary>click to view</summary>
-#### Omni-modal Understanding
 <div align="center">
   <table style="margin: 0 auto; text-align: center;">
@@ -937,178 +935,6 @@ We sugguest readers to refer to our [**Github**](https://github.com/baichuan-inc
           <tr>
           <td>Model</td>
           <td>Size</td>
-          <td>Image & Audio</td>
-          <td>Image Caption & Audio</td>
-          <td>Image & Audio Transcript</td>
-          <td>Image Caption & Audio Transcript</td>
-          </tr>
-      </thead>
-      <tr>
-        <td colspan="6">Proprietary Models</td>
-      </tr>
-      <tr>
-        <td>GPT4o-mini</td>
-        <td>-</td>
-        <td>-</td>
-        <td>-</td>
-        <td>37.0</td>
-        <td>37.7</td>
-      </tr>
-      <tr>
-        <td colspan="6">Open-source Models (Omni-modal)</td>
-      </tr>
-      <tr>
-        <td>VITA</td>
-        <td>8x7B</td>
-        <td>33.1</td>
-        <td>31.8</td>
-        <td>42.0</td>
-        <td>44.2</td>
-      </tr>
-      <tr>
-        <td>VITA-1.5</td>
-        <td>7B</td>
-        <td>33.4</td>
-        <td>29.6</td>
-        <td>48.5</td>
-        <td><b>47.2<br></td>
-      </tr>
-      <tr>
-        <td>Baichuan-Omni</td>
-        <td>7B</td>
-        <td>32.2</td>
-        <td>26.5</td>
-        <td>42.6</td>
-        <td>44.2</td>
-      </tr>
-      <tr>
-        <td>MiniCPM-o 2.6</td>
-        <td>7B</td>
-        <td>40.5</td>
-        <td>30.8</td>
-        <td><b>53.2<br></td>
-        <td>46.3</td>
-      </tr>
-      <tr>
-        <td><b>Baichuan-Omni-1.5<br></td>
-        <td>7B</td>
-        <td><b>42.9<br></td>
-        <td><b>37.7<br></td>
-        <td>47.9</td>
-        <td>46.9</td>
-      </tr>
-    </tbody>
-  </table>
-</div>
-</details>
-<details>
-<summary>click to view</summary>
-#### Medical Image Understanding Capabilities
-<div align="center">
-  <table style="margin: 0 auto; text-align: center;">
-    <thead>
-        <tr>
-          <th colspan="7">Medical Understanding&nbsp;&nbsp;&nbsp;</th>
-        </tr>
-      </thead>
-      <tbody>
-          <tr>
-          <td>Model</td>
-          <td>Size</td>
-          <td>GMAI-MMB-VAL (Acc.)</td>
-          <td>OpenMM-Medical (Acc.)</td>
-          </tr>
-      </thead>
-      <tr>
-        <td colspan="4">Proprietary Models</td>
-      </tr>
-      <tr>
-        <td>GPT4o-mini</td>
-        <td>-</td>
-        <td>46.4</td>
-        <td>74.3</td>
-      </tr>
-      <tr>
-        <td colspan="4">Open-source Models (Vision-Language)</td>
-      </tr>
-      <tr>
-        <td>Qwen2 VL</td>
-        <td>7B</td>
-        <td>46.3</td>
-        <td>76.9</td>
-      </tr>
-      <tr>
-        <td>Qwen2 VL</td>
-        <td>72B</td>
-        <td><b>50.7<br></td>
-        <td>80.7</td>
-      </tr>
-      <tr>
-        <td colspan="4">Open-source Models (Omni-modal)</td>
-      </tr>
-      <tr>
-        <td>VITA-1.5</td>
-        <td>7B</td>
-        <td>36.7</td>
-        <td>67.1</td>
-      </tr>
-      <tr>
-        <td>MiniCPM-o 2.6</td>
-        <td>7B</td>
-        <td>41.5</td>
-        <td>73.6</td>
-      </tr>
-      <tr>
-        <td><b>Baichuan-Omni-1.5<br></td>
-        <td>7B</td>
-        <td>49.9</td>
-        <td><b>83.8<br></td>
-      </tr>
-    </tbody>
-  </table>
-</div>
-</details>
-## Examples
-<br>
-<div style="display: flex; flex-direction: column; align-items: center;">
-  <img src="https://github.com/baichuan-inc/Baichuan-Omni-1.5/raw/main/assets/pipeline.png" alt="pipeline" style="margin-bottom: 5px;">
-  <img src="https://github.com/baichuan-inc/Baichuan-Omni-1.5/raw/main/assets/math.png" alt="math" style="margin-bottom: 5px;">
-  <img src="https://github.com/baichuan-inc/Baichuan-Omni-1.5/raw/main/assets/fly_bill.png" alt="fly_bill" style="margin-bottom: 5px;">
-</div>
-## 🚀 Quick Start
-We recommend interested scholars to visit our github repo for more details. [**Github**](https://github.com/baichuan-inc/Baichuan-Omni-1.5/)
-### Statement
-- We hereby declare that our team has not developed any applications based on Baichuan-Omni-1.5/Baichuan-Omni-1.5-base models, not on iOS, Android, the web, or any other platform. We strongly call on all users not to use Baichuan-Omni-1.5/Baichuan-Omni-1.5-base models for any activities that harm national / social security or violate the law. Also, we ask users not to use Baichuan-Omni-1.5/Baichuan-Omni-1.5-base models for Internet services that have not undergone appropriate security reviews and filings. We hope that all users can abide by this principle and ensure that the development of technology proceeds in a regulated and legal environment.
-- We have done our best to ensure the compliance of the data used in the model training process. However, despite our considerable efforts, there may still be some unforeseeable issues due to the complexity of the model and data. Therefore, if any problems arise due to the use of Baichuan-Omni-1.5/Baichuan-Omni-1.5-base open-source models, including but not limited to data security issues, public opinion risks, or any risks and problems brought about by the model being misled, abused, spread or improperly exploited, we will not assume any responsibility.
-### License
-The community usage of Baichuan-Omni-1.5/Baichuan-Omni-1.5-base requires adherence to [Apache 2.0](https://github.com/baichuan-inc/Baichuan-Omni-1.5/blob/main/LICENSE) and [Community License for Baichuan-Omni-1.5 Models](https://github.com/baichuan-inc/Baichuan-Omni-1.5/blob/main/LICENSE). The Baichuan-Omni-1.5/Baichuan-Omni-1.5-base models supports commercial use. If you plan to use the Baichuan-Omni-1.5/Baichuan-Omni-1.5-base models or its derivatives for commercial purposes, please ensure that your entity meets the following conditions:
-  1. The Daily Active Users (DAU) of your or your affiliate's service or product is less than 1 million.
-  2. Neither you nor your affiliates are software service providers or cloud service providers.
-  3. There is no possibility for you or your affiliates to grant the commercial license given to you, to reauthorize it to other third parties without Baichuan's permission.
-Upon meeting the above conditions, you need to submit the application materials required by the Baichuan-Omni-1.5 Model Community License Agreement via the following contact email: [email protected]. Once approved, Baichuan will hereby grant you a non-exclusive, global, non-transferable, non-sublicensable, revocable commercial copyright license.
-<!-- ### Citation
-If you find our work helpful, please consider citing our papers 📝 and liking this project ❤️！
-```bib
-@article{
-} -->
-```

 ---
 license: apache-2.0
+pipeline_tag: any-to-any
 ---
 <div align="center">
 <img src="https://github.com/baichuan-inc/Baichuan-Omni-1.5/raw/main/assets/logo.png" width="300em" ></img>
 <p align="center">
+  Baichuan-Omni-1.5 <a href="https://huggingface.co/baichuan-inc/Baichuan-Omni-1d5">🤗</a> | Baichuan-Omni-1.5-Base <a href="https://huggingface.co/baichuan-inc/Baichuan-Omni-1d5-Base">🤗</a>  |Github <a href="https://github.com/baichuan-inc/Baichuan-Omni-1.5/">📖 </a> | Report <a href="https://huggingface.co/papers/2501.15368">📖</a>
 </p>
 </p>
   <p align="center">
 <details>
+<summary>Click here to view detailed evaluation results of image understanding ability.</summary>
+#### Image understanding ability
 <div align="center">
   <table style="margin: 0 auto; text-align: center;">
       <tr>
         <td>Model</td>
         <td>Size</td>
+        <td>MMBench-EN <br>(Acc.)</td>
+        <td>MMbench-CN <br>(Acc.)</td>
+        <td>SEED-IMG <br>(Acc.)</td>
+        <td>MMMU-val <br>(Acc.)</td>
+        <td>HallusionBench <br>(Acc.)</td>
       </tr>
       <tr>
         <td colspan="9">Proprietary Models</td>
       <tr>
         <td>Model</td>
         <td>Size</td>
+        <td>RealWorldQA <br>(Acc.)</td>
+        <td>MathVista-mini <br>(Acc.)</td>
+        <td>TextVQA-val <br>(Acc.)</td>
+        <td>ChartQA <br>(Acc.)</td>
+        <td>OCRBench <br>(Acc.)</td>
       </tr>
       <tr>
         <td colspan="8">Proprietary Models</td>
 <details>
+<summary>Click here to view detailed evaluation results of video understanding ability.</summary>
+#### Video understanding ability
 <div align="center">
   <table style="margin: 0 auto; text-align: center;">
     <thead>
         <td>Model</td>
         <td>Size</td>
         <td># Frames</td>
+        <td>MVBench <br>(Acc.)</td>
+        <td>Egoschema <br>(Acc.)</td>
+        <td>VideoMME <br>(Acc.)</td>
+        <td>Perception-Test <br>(Acc.)</td>
       </tr>
       <tr>
         <td colspan="7">Proprietary Models</td>
         <td>VideoLLaMA 2</td>
         <td>7B</td>
         <td>16</td>
+        <td>50.2*</td>
         <td>51.7*</td>
         <td>46.6*</td>
         <td>51.4*</td>
       <tr>
         <td>Baichuan-Omni</td>
         <td>7B</td>
+        <td>1 fps (max 48)</td>
         <td>60.9</td>
         <td>58.8</td>
         <td>58.2</td>
   </table>
 </div>
 <br>
 <div align="center">
 </details>
 <details>
+<summary>Click here to view detailed evaluation results of audio understanding and generation ability.</summary>
+#### Audio understanding and generation ability
 <div align="center">
   <table style="margin: 0 auto; text-align: center;">
   <thead>
   </tbody>
   </table>
 </div>
 </details>
 <details>
+<summary>Click here to view the detailed evaluation results of omni-modal understanding ability.</summary>
+#### Omni-modal understanding ability
 <div align="center">
   <table style="margin: 0 auto; text-align: center;">
           <tr>
           <td>Model</td>
           <td>Size</td>
+          <td>Image & <br> Audio (Acc.)</td>
+          <td>Image Caption & <br> Audio (Acc.)</td>
+          <td>Image & Audio