Update README.md
Browse files
README.md
CHANGED
@@ -24,19 +24,6 @@ base_model: meta-llama/Llama-2-70b-hf
|
|
24 |
This instruction model was built via parameter-efficient QLoRA finetuning of [llama-2-70b](https://huggingface.co/meta-llama/Llama-2-70b-hf) on the first 25k rows of [ehartford/dolphin](https://huggingface.co/datasets/ehartford/dolphin) (an open-source implementation of [Microsoft's Orca](https://www.microsoft.com/en-us/research/publication/orca-progressive-learning-from-complex-explanation-traces-of-gpt-4/)). Finetuning was executed on a single H100 (80 GB PCIe) for roughly 17 hours on the [Lambda Labs](https://cloud.lambdalabs.com/instances) platform.
|
25 |
|
26 |
|
27 |
-
|
28 |
-
## Benchmark metrics
|
29 |
-
|
30 |
-
| Metric | Value |
|
31 |
-
|-----------------------|-------|
|
32 |
-
| MMLU (5-shot) | 69.18 |
|
33 |
-
| ARC (25-shot) | 69.62 |
|
34 |
-
| HellaSwag (10-shot) | 86.82 |
|
35 |
-
| TruthfulQA (0-shot) | 57.43 |
|
36 |
-
| Avg. | 70.76 |
|
37 |
-
|
38 |
-
We use state-of-the-art [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, using the same version as Hugging Face's [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
|
39 |
-
|
40 |
## Helpful links
|
41 |
|
42 |
* Model license: Llama 2 Community License Agreement
|
@@ -45,6 +32,21 @@ We use state-of-the-art [Language Model Evaluation Harness](https://github.com/E
|
|
45 |
* Loss curves: [plot](https://huggingface.co/dfurman/Llama-2-70B-Instruct-v0.1-peft#finetuning-description)
|
46 |
* Runtime stats: [table](https://huggingface.co/dfurman/Llama-2-70B-Instruct-v0.1-peft#runtime-tests)
|
47 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
48 |
## Example prompts and responses
|
49 |
|
50 |
Example 1:
|
@@ -260,16 +262,4 @@ The license on this model does not constitute legal advice. We are not responsib
|
|
260 |
## Framework versions
|
261 |
|
262 |
- PEFT 0.5.0.dev0
|
263 |
-
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
264 |
-
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_dfurman__llama-2-70b-dolphin-peft)
|
265 |
|
266 |
-
| Metric | Value |
|
267 |
-
|-----------------------|---------------------------|
|
268 |
-
| Avg. | 57.34 |
|
269 |
-
| ARC (25-shot) | 69.62 |
|
270 |
-
| HellaSwag (10-shot) | 86.82 |
|
271 |
-
| MMLU (5-shot) | 69.18 |
|
272 |
-
| TruthfulQA (0-shot) | 57.43 |
|
273 |
-
| Winogrande (5-shot) | 83.9 |
|
274 |
-
| GSM8K (5-shot) | 27.37 |
|
275 |
-
| DROP (3-shot) | 7.03 |
|
|
|
24 |
This instruction model was built via parameter-efficient QLoRA finetuning of [llama-2-70b](https://huggingface.co/meta-llama/Llama-2-70b-hf) on the first 25k rows of [ehartford/dolphin](https://huggingface.co/datasets/ehartford/dolphin) (an open-source implementation of [Microsoft's Orca](https://www.microsoft.com/en-us/research/publication/orca-progressive-learning-from-complex-explanation-traces-of-gpt-4/)). Finetuning was executed on a single H100 (80 GB PCIe) for roughly 17 hours on the [Lambda Labs](https://cloud.lambdalabs.com/instances) platform.
|
25 |
|
26 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
27 |
## Helpful links
|
28 |
|
29 |
* Model license: Llama 2 Community License Agreement
|
|
|
32 |
* Loss curves: [plot](https://huggingface.co/dfurman/Llama-2-70B-Instruct-v0.1-peft#finetuning-description)
|
33 |
* Runtime stats: [table](https://huggingface.co/dfurman/Llama-2-70B-Instruct-v0.1-peft#runtime-tests)
|
34 |
|
35 |
+
|
36 |
+
## [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
37 |
+
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_dfurman__llama-2-70b-dolphin-peft)
|
38 |
+
|
39 |
+
| Metric | Value |
|
40 |
+
|-----------------------|---------------------------|
|
41 |
+
| Avg. | 57.34 |
|
42 |
+
| ARC (25-shot) | 69.62 |
|
43 |
+
| HellaSwag (10-shot) | 86.82 |
|
44 |
+
| MMLU (5-shot) | 69.18 |
|
45 |
+
| TruthfulQA (0-shot) | 57.43 |
|
46 |
+
| Winogrande (5-shot) | 83.9 |
|
47 |
+
| GSM8K (5-shot) | 27.37 |
|
48 |
+
| DROP (3-shot) | 7.03 |
|
49 |
+
|
50 |
## Example prompts and responses
|
51 |
|
52 |
Example 1:
|
|
|
262 |
## Framework versions
|
263 |
|
264 |
- PEFT 0.5.0.dev0
|
|
|
|
|
265 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|