update readme
Browse files
README.md
CHANGED
|
@@ -281,25 +281,22 @@ Note: Due to rounding errors caused by hardware and framework, differences in re
|
|
| 281 |
|
| 282 |
#### C-Eval
|
| 283 |
|
| 284 |
-
在[C-Eval](https://arxiv.org/abs/2305.08322)验证集上,我们评价了Qwen-1.8B-Chat
|
| 285 |
-
|
| 286 |
-
We demonstrate the
|
| 287 |
-
|
| 288 |
-
| Model
|
| 289 |
-
|
| 290 |
-
|
|
| 291 |
-
|
|
| 292 |
-
|
|
| 293 |
-
|
|
| 294 |
-
|
|
| 295 |
-
|
|
| 296 |
-
|
|
| 297 |
-
|
|
| 298 |
-
|
|
| 299 |
-
|
|
| 300 |
-
| Firefly-Bloom-1B4 | 23.6 |
|
| 301 |
-
| OpenBuddy-3B | 23.5 |
|
| 302 |
-
| RedPajama-INCITE-Chat-3B | 18.3 |
|
| 303 |
|
| 304 |
C-Eval测试集上,Qwen-1.8B-Chat模型的zero-shot准确率结果如下:
|
| 305 |
|
|
@@ -307,35 +304,35 @@ The zero-shot accuracy of Qwen-1.8B-Chat on C-Eval testing set is provided below
|
|
| 307 |
|
| 308 |
| Model | Avg. | STEM | Social Sciences | Humanities | Others |
|
| 309 |
| :---------------------: | :------: | :--: | :-------------: | :--------: | :----: |
|
| 310 |
-
| **Qwen-7B-Chat** | 54.6 | 47.8 | 67.6 | 59.3 | 50.6 |
|
| 311 |
-
| Baichuan-13B-Chat | 51.5 | 43.7 | 64.6 | 56.2 | 49.2 |
|
| 312 |
-
| ChatGLM2-6B-Chat | 50.1 | 46.4 | 60.4 | 50.6 | 46.9 |
|
| 313 |
-
| **Qwen-1.8B-Chat** | 53.8 | 48.4 | 68.0 | 56.5 | 48.3 |
|
| 314 |
| Chinese-Alpaca-Plus-13B | 41.5 | 36.6 | 49.7 | 43.1 | 41.2 |
|
| 315 |
| Chinese-Alpaca-2-7B | 40.3 | - | - | - | - |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 316 |
|
| 317 |
### 英文评测(English Evaluation)
|
| 318 |
|
| 319 |
#### MMLU
|
| 320 |
|
| 321 |
-
[MMLU](https://arxiv.org/abs/2009.03300)评测集上,Qwen-1.8B-Chat
|
| 322 |
|
| 323 |
-
The
|
| 324 |
The performance of Qwen-1.8B-Chat still on the top between other human-aligned models with comparable size.
|
| 325 |
|
| 326 |
-
| Model
|
| 327 |
-
|
| 328 |
-
|
|
| 329 |
-
|
|
| 330 |
-
|
|
| 331 |
-
|
|
| 332 |
-
|
|
| 333 |
-
|
|
| 334 |
-
|
|
| 335 |
-
|
|
| 336 |
-
|
|
| 337 |
-
|
|
| 338 |
-
|
|
| 339 |
|
| 340 |
### 代码评测(Coding Evaluation)
|
| 341 |
|
|
@@ -345,16 +342,16 @@ The zero-shot Pass@1 of Qwen-1.8B-Chat on [HumanEval](https://github.com/openai/
|
|
| 345 |
|
| 346 |
| Model | Pass@1 |
|
| 347 |
|:------------------------:|:------:|
|
| 348 |
-
| **Qwen-7B-Chat** | 24.4 |
|
| 349 |
-
| LLaMA2-13B-Chat | 18.9 |
|
| 350 |
-
| Baichuan-13B-Chat | 16.5 |
|
| 351 |
-
| InternLM-7B-Chat | 14.0 |
|
| 352 |
-
| LLaMA2-7B-Chat | 12.2 |
|
| 353 |
-
| **Qwen-1.8B-Chat** | 26.2 |
|
| 354 |
-
| OpenBuddy-3B | 10.4 |
|
| 355 |
-
| RedPajama-INCITE-Chat-3B | 6.1 |
|
| 356 |
-
| OpenLLaMA-Chinese-3B | 4.9 |
|
| 357 |
| Firefly-Bloom-1B4 | 0.6 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 358 |
|
| 359 |
### 数学评测(Mathematics Evaluation)
|
| 360 |
|
|
@@ -362,20 +359,19 @@ The zero-shot Pass@1 of Qwen-1.8B-Chat on [HumanEval](https://github.com/openai/
|
|
| 362 |
|
| 363 |
The accuracy of Qwen-1.8B-Chat on GSM8K is shown below
|
| 364 |
|
| 365 |
-
|
|
| 366 |
-
|
| 367 |
-
|
|
| 368 |
-
|
|
| 369 |
-
|
|
| 370 |
-
|
|
| 371 |
-
|
|
| 372 |
-
|
|
| 373 |
-
|
|
| 374 |
-
|
|
| 375 |
-
|
|
| 376 |
-
|
|
| 377 |
-
|
|
| 378 |
-
| Firefly-Bloom-1B4 | 2.4 | 1.8 |
|
| 379 |
|
| 380 |
## 评测复现(Reproduction)
|
| 381 |
|
|
|
|
| 281 |
|
| 282 |
#### C-Eval
|
| 283 |
|
| 284 |
+
在[C-Eval](https://arxiv.org/abs/2305.08322)验证集上,我们评价了Qwen-1.8B-Chat模型的准确率
|
| 285 |
+
|
| 286 |
+
We demonstrate the accuracy of Qwen-1.8B-Chat on C-Eval validation set
|
| 287 |
+
|
| 288 |
+
| Model | Acc. |
|
| 289 |
+
|:--------------------------------:|:---------:|
|
| 290 |
+
| RedPajama-INCITE-Chat-3B | 18.3 |
|
| 291 |
+
| OpenBuddy-3B | 23.5 |
|
| 292 |
+
| Firefly-Bloom-1B4 | 23.6 |
|
| 293 |
+
| OpenLLaMA-Chinese-3B | 24.4 |
|
| 294 |
+
| LLaMA2-7B-Chat | 31.9 |
|
| 295 |
+
| ChatGLM2-6B-Chat | 52.6 |
|
| 296 |
+
| InternLM-7B-Chat | 53.6 |
|
| 297 |
+
| **Qwen-1.8B-Chat (0-shot)** | 55.6 |
|
| 298 |
+
| **Qwen-7B-Chat (0-shot)** | 59.7 |
|
| 299 |
+
| **Qwen-7B-Chat (5-shot)** | 59.3 |
|
|
|
|
|
|
|
|
|
|
| 300 |
|
| 301 |
C-Eval测试集上,Qwen-1.8B-Chat模型的zero-shot准确率结果如下:
|
| 302 |
|
|
|
|
| 304 |
|
| 305 |
| Model | Avg. | STEM | Social Sciences | Humanities | Others |
|
| 306 |
| :---------------------: | :------: | :--: | :-------------: | :--------: | :----: |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 307 |
| Chinese-Alpaca-Plus-13B | 41.5 | 36.6 | 49.7 | 43.1 | 41.2 |
|
| 308 |
| Chinese-Alpaca-2-7B | 40.3 | - | - | - | - |
|
| 309 |
+
| ChatGLM2-6B-Chat | 50.1 | 46.4 | 60.4 | 50.6 | 46.9 |
|
| 310 |
+
| Baichuan-13B-Chat | 51.5 | 43.7 | 64.6 | 56.2 | 49.2 |
|
| 311 |
+
| **Qwen-1.8B-Chat** | 53.8 | 48.4 | 68.0 | 56.5 | 48.3 |
|
| 312 |
+
| **Qwen-7B-Chat** | 58.6 | 53.3 | 72.1 | 62.8 | 52.0 |
|
| 313 |
|
| 314 |
### 英文评测(English Evaluation)
|
| 315 |
|
| 316 |
#### MMLU
|
| 317 |
|
| 318 |
+
[MMLU](https://arxiv.org/abs/2009.03300)评测集上,Qwen-1.8B-Chat模型的准确率如下,效果同样在同类对齐模型中同样表现较优。
|
| 319 |
|
| 320 |
+
The accuracy of Qwen-1.8B-Chat on MMLU is provided below.
|
| 321 |
The performance of Qwen-1.8B-Chat still on the top between other human-aligned models with comparable size.
|
| 322 |
|
| 323 |
+
| Model | Acc. |
|
| 324 |
+
|:--------------------------------:|:---------:|
|
| 325 |
+
| Firefly-Bloom-1B4 | 23.8 |
|
| 326 |
+
| OpenBuddy-3B | 25.5 |
|
| 327 |
+
| RedPajama-INCITE-Chat-3B | 25.5 |
|
| 328 |
+
| OpenLLaMA-Chinese-3B | 25.7 |
|
| 329 |
+
| ChatGLM2-6B-Chat | 46.0 |
|
| 330 |
+
| LLaMA2-7B-Chat | 46.2 |
|
| 331 |
+
| InternLM-7B-Chat | 51.1 |
|
| 332 |
+
| Baichuan2-7B-Chat | 52.9 |
|
| 333 |
+
| **Qwen-1.8B-Chat (0-shot)** | 43.3 |
|
| 334 |
+
| **Qwen-7B-Chat (0-shot)** | 55.8 |
|
| 335 |
+
| **Qwen-7B-Chat (5-shot)** | 57.0 |
|
| 336 |
|
| 337 |
### 代码评测(Coding Evaluation)
|
| 338 |
|
|
|
|
| 342 |
|
| 343 |
| Model | Pass@1 |
|
| 344 |
|:------------------------:|:------:|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 345 |
| Firefly-Bloom-1B4 | 0.6 |
|
| 346 |
+
| OpenLLaMA-Chinese-3B | 4.9 |
|
| 347 |
+
| RedPajama-INCITE-Chat-3B | 6.1 |
|
| 348 |
+
| OpenBuddy-3B | 10.4 |
|
| 349 |
+
| ChatGLM2-6B-Chat | 11.0 |
|
| 350 |
+
| LLaMA2-7B-Chat | 12.2 |
|
| 351 |
+
| Baichuan2-7B-Chat | 13.4 |
|
| 352 |
+
| InternLM-7B-Chat | 14.6 |
|
| 353 |
+
| **Qwen-1.8B-Chat** | 26.2 |
|
| 354 |
+
| **Qwen-7B-Chat** | 37.2 |
|
| 355 |
|
| 356 |
### 数学评测(Mathematics Evaluation)
|
| 357 |
|
|
|
|
| 359 |
|
| 360 |
The accuracy of Qwen-1.8B-Chat on GSM8K is shown below
|
| 361 |
|
| 362 |
+
| Model | Acc. |
|
| 363 |
+
|:------------------------------------:|:--------:|
|
| 364 |
+
| Firefly-Bloom-1B4 | 2.4 |
|
| 365 |
+
| RedPajama-INCITE-Chat-3B | 2.5 |
|
| 366 |
+
| OpenLLaMA-Chinese-3B | 3.0 |
|
| 367 |
+
| OpenBuddy-3B | 12.6 |
|
| 368 |
+
| LLaMA2-7B-Chat | 26.3 |
|
| 369 |
+
| ChatGLM2-6B-Chat | 28.8 |
|
| 370 |
+
| Baichuan2-7B-Chat | 32.8 |
|
| 371 |
+
| InternLM-7B-Chat | 33.0 |
|
| 372 |
+
| **Qwen-1.8B-Chat (0-shot)** | 33.7 |
|
| 373 |
+
| **Qwen-7B-Chat (0-shot)** | 50.3 |
|
| 374 |
+
| **Qwen-7B-Chat (8-shot)** | 54.1 |
|
|
|
|
| 375 |
|
| 376 |
## 评测复现(Reproduction)
|
| 377 |
|