BAAI
/

Infinity-Instruct-3M-0625-Mistral-7B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

hyxmmm commited on Jul 9

Commit

f923f14

•

1 Parent(s): 48b2226

Update README.md

Files changed (1) hide show

README.md +0 -8

README.md CHANGED Viewed

@@ -70,14 +70,6 @@ Thanks to [FlagScale](https://github.com/FlagOpen/FlagScale), we could concatena
 We evaluate Infinity-Instruct-3M-0625-Mistral-7B on the two most popular instructions following benchmarks. Mt-Bench is a set of challenging multi-turn questions including code, math and routine dialogue. AlpacaEval2.0 is based on AlpacaFarm evaluation set. Both of these two benchmarks use GPT-4 to judge the model answer. AlpacaEval2.0 displays a high agreement rate with human-annotated benchmark, Chatbot Arena. The result shows that InfInstruct-3M-0625-Mistral-7B achieved 31.42 in AlpacaEval2.0, which is higher than the 22.5 of GPT3.5 Turbo although it does not yet use RLHF. InfInstruct-3M-0625-Mistral-7B also achieves 8.1 in MT-Bench, which is comparable to the state-of-the-art billion-parameter LLM such as Llama-3-8B-Instruct and Mistral-7B-Instruct-v0.2.
-## Performance on **Downstream tasks**
-We also evaluate Infinity-Instruct-3M-0625-Mistral-7B on diverse objective downstream tasks with [Opencompass](https://opencompass.org.cn):
-<p align="center">
-<img src="fig/result.png">
-</p>
 ## **How to use**
 Infinity-Instruct-3M-0625-Mistral-7B adopt the same chat template of [OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B):

 We evaluate Infinity-Instruct-3M-0625-Mistral-7B on the two most popular instructions following benchmarks. Mt-Bench is a set of challenging multi-turn questions including code, math and routine dialogue. AlpacaEval2.0 is based on AlpacaFarm evaluation set. Both of these two benchmarks use GPT-4 to judge the model answer. AlpacaEval2.0 displays a high agreement rate with human-annotated benchmark, Chatbot Arena. The result shows that InfInstruct-3M-0625-Mistral-7B achieved 31.42 in AlpacaEval2.0, which is higher than the 22.5 of GPT3.5 Turbo although it does not yet use RLHF. InfInstruct-3M-0625-Mistral-7B also achieves 8.1 in MT-Bench, which is comparable to the state-of-the-art billion-parameter LLM such as Llama-3-8B-Instruct and Mistral-7B-Instruct-v0.2.
 ## **How to use**
 Infinity-Instruct-3M-0625-Mistral-7B adopt the same chat template of [OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B):