Update README.md
Browse files
README.md
CHANGED
@@ -70,14 +70,6 @@ Thanks to [FlagScale](https://github.com/FlagOpen/FlagScale), we could concatena
|
|
70 |
|
71 |
We evaluate Infinity-Instruct-3M-0625-Mistral-7B on the two most popular instructions following benchmarks. Mt-Bench is a set of challenging multi-turn questions including code, math and routine dialogue. AlpacaEval2.0 is based on AlpacaFarm evaluation set. Both of these two benchmarks use GPT-4 to judge the model answer. AlpacaEval2.0 displays a high agreement rate with human-annotated benchmark, Chatbot Arena. The result shows that InfInstruct-3M-0625-Mistral-7B achieved 31.42 in AlpacaEval2.0, which is higher than the 22.5 of GPT3.5 Turbo although it does not yet use RLHF. InfInstruct-3M-0625-Mistral-7B also achieves 8.1 in MT-Bench, which is comparable to the state-of-the-art billion-parameter LLM such as Llama-3-8B-Instruct and Mistral-7B-Instruct-v0.2.
|
72 |
|
73 |
-
## Performance on **Downstream tasks**
|
74 |
-
|
75 |
-
We also evaluate Infinity-Instruct-3M-0625-Mistral-7B on diverse objective downstream tasks with [Opencompass](https://opencompass.org.cn):
|
76 |
-
|
77 |
-
<p align="center">
|
78 |
-
<img src="fig/result.png">
|
79 |
-
</p>
|
80 |
-
|
81 |
## **How to use**
|
82 |
|
83 |
Infinity-Instruct-3M-0625-Mistral-7B adopt the same chat template of [OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B):
|
|
|
70 |
|
71 |
We evaluate Infinity-Instruct-3M-0625-Mistral-7B on the two most popular instructions following benchmarks. Mt-Bench is a set of challenging multi-turn questions including code, math and routine dialogue. AlpacaEval2.0 is based on AlpacaFarm evaluation set. Both of these two benchmarks use GPT-4 to judge the model answer. AlpacaEval2.0 displays a high agreement rate with human-annotated benchmark, Chatbot Arena. The result shows that InfInstruct-3M-0625-Mistral-7B achieved 31.42 in AlpacaEval2.0, which is higher than the 22.5 of GPT3.5 Turbo although it does not yet use RLHF. InfInstruct-3M-0625-Mistral-7B also achieves 8.1 in MT-Bench, which is comparable to the state-of-the-art billion-parameter LLM such as Llama-3-8B-Instruct and Mistral-7B-Instruct-v0.2.
|
72 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
73 |
## **How to use**
|
74 |
|
75 |
Infinity-Instruct-3M-0625-Mistral-7B adopt the same chat template of [OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B):
|