dfurman
/

Llama-2-70B-Instruct-v0.1

Text Generation

PEFT

Safetensors

llama-2

Eval Results

Model card Files Files and versions Community

dfurman commited on Nov 18, 2023

Commit

a443115

1 Parent(s): e8b3803

Update README.md

Browse files

Files changed (1) hide show

README.md +15 -25

README.md CHANGED Viewed

@@ -24,19 +24,6 @@ base_model: meta-llama/Llama-2-70b-hf
 This instruction model was built via parameter-efficient QLoRA finetuning of [llama-2-70b](https://huggingface.co/meta-llama/Llama-2-70b-hf) on the first 25k rows of [ehartford/dolphin](https://huggingface.co/datasets/ehartford/dolphin) (an open-source implementation of [Microsoft's Orca](https://www.microsoft.com/en-us/research/publication/orca-progressive-learning-from-complex-explanation-traces-of-gpt-4/)). Finetuning was executed on a single H100 (80 GB PCIe) for roughly 17 hours on the [Lambda Labs](https://cloud.lambdalabs.com/instances) platform.
-## Benchmark metrics
-| Metric                | Value |
-|-----------------------|-------|
-| MMLU (5-shot)         | 69.18 |
-| ARC (25-shot)         | 69.62 |
-| HellaSwag (10-shot)   | 86.82 |
-| TruthfulQA (0-shot)   | 57.43 |
-| Avg.                  | 70.76 |
-We use state-of-the-art [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, using the same version as Hugging Face's [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
 ## Helpful links
 * Model license: Llama 2 Community License Agreement
@@ -45,6 +32,21 @@ We use state-of-the-art [Language Model Evaluation Harness](https://github.com/E
 * Loss curves: [plot](https://huggingface.co/dfurman/Llama-2-70B-Instruct-v0.1-peft#finetuning-description)
 * Runtime stats: [table](https://huggingface.co/dfurman/Llama-2-70B-Instruct-v0.1-peft#runtime-tests)
 ## Example prompts and responses
 Example 1:
@@ -260,16 +262,4 @@ The license on this model does not constitute legal advice. We are not responsib
 ## Framework versions
 - PEFT 0.5.0.dev0
-# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
-Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_dfurman__llama-2-70b-dolphin-peft)
-| Metric                | Value                     |
-|-----------------------|---------------------------|
-| Avg.                  | 57.34   |
-| ARC (25-shot)         | 69.62          |
-| HellaSwag (10-shot)   | 86.82    |
-| MMLU (5-shot)         | 69.18         |
-| TruthfulQA (0-shot)   | 57.43   |
-| Winogrande (5-shot)   | 83.9   |
-| GSM8K (5-shot)        | 27.37        |
-| DROP (3-shot)         | 7.03         |

 This instruction model was built via parameter-efficient QLoRA finetuning of [llama-2-70b](https://huggingface.co/meta-llama/Llama-2-70b-hf) on the first 25k rows of [ehartford/dolphin](https://huggingface.co/datasets/ehartford/dolphin) (an open-source implementation of [Microsoft's Orca](https://www.microsoft.com/en-us/research/publication/orca-progressive-learning-from-complex-explanation-traces-of-gpt-4/)). Finetuning was executed on a single H100 (80 GB PCIe) for roughly 17 hours on the [Lambda Labs](https://cloud.lambdalabs.com/instances) platform.
 ## Helpful links
 * Model license: Llama 2 Community License Agreement
 * Loss curves: [plot](https://huggingface.co/dfurman/Llama-2-70B-Instruct-v0.1-peft#finetuning-description)
 * Runtime stats: [table](https://huggingface.co/dfurman/Llama-2-70B-Instruct-v0.1-peft#runtime-tests)
+## [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
+Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_dfurman__llama-2-70b-dolphin-peft)
+| Metric                | Value                     |
+|-----------------------|---------------------------|
+| Avg.                  | 57.34   |
+| ARC (25-shot)         | 69.62          |
+| HellaSwag (10-shot)   | 86.82    |
+| MMLU (5-shot)         | 69.18         |
+| TruthfulQA (0-shot)   | 57.43   |
+| Winogrande (5-shot)   | 83.9   |
+| GSM8K (5-shot)        | 27.37        |
+| DROP (3-shot)         | 7.03         |
 ## Example prompts and responses
 Example 1:
 ## Framework versions
 - PEFT 0.5.0.dev0