Text Generation
PEFT
Safetensors
llama-2
Eval Results
dfurman commited on
Commit
a443115
·
1 Parent(s): e8b3803

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -25
README.md CHANGED
@@ -24,19 +24,6 @@ base_model: meta-llama/Llama-2-70b-hf
24
  This instruction model was built via parameter-efficient QLoRA finetuning of [llama-2-70b](https://huggingface.co/meta-llama/Llama-2-70b-hf) on the first 25k rows of [ehartford/dolphin](https://huggingface.co/datasets/ehartford/dolphin) (an open-source implementation of [Microsoft's Orca](https://www.microsoft.com/en-us/research/publication/orca-progressive-learning-from-complex-explanation-traces-of-gpt-4/)). Finetuning was executed on a single H100 (80 GB PCIe) for roughly 17 hours on the [Lambda Labs](https://cloud.lambdalabs.com/instances) platform.
25
 
26
 
27
-
28
- ## Benchmark metrics
29
-
30
- | Metric | Value |
31
- |-----------------------|-------|
32
- | MMLU (5-shot) | 69.18 |
33
- | ARC (25-shot) | 69.62 |
34
- | HellaSwag (10-shot) | 86.82 |
35
- | TruthfulQA (0-shot) | 57.43 |
36
- | Avg. | 70.76 |
37
-
38
- We use state-of-the-art [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, using the same version as Hugging Face's [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
39
-
40
  ## Helpful links
41
 
42
  * Model license: Llama 2 Community License Agreement
@@ -45,6 +32,21 @@ We use state-of-the-art [Language Model Evaluation Harness](https://github.com/E
45
  * Loss curves: [plot](https://huggingface.co/dfurman/Llama-2-70B-Instruct-v0.1-peft#finetuning-description)
46
  * Runtime stats: [table](https://huggingface.co/dfurman/Llama-2-70B-Instruct-v0.1-peft#runtime-tests)
47
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48
  ## Example prompts and responses
49
 
50
  Example 1:
@@ -260,16 +262,4 @@ The license on this model does not constitute legal advice. We are not responsib
260
  ## Framework versions
261
 
262
  - PEFT 0.5.0.dev0
263
- # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
264
- Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_dfurman__llama-2-70b-dolphin-peft)
265
 
266
- | Metric | Value |
267
- |-----------------------|---------------------------|
268
- | Avg. | 57.34 |
269
- | ARC (25-shot) | 69.62 |
270
- | HellaSwag (10-shot) | 86.82 |
271
- | MMLU (5-shot) | 69.18 |
272
- | TruthfulQA (0-shot) | 57.43 |
273
- | Winogrande (5-shot) | 83.9 |
274
- | GSM8K (5-shot) | 27.37 |
275
- | DROP (3-shot) | 7.03 |
 
24
  This instruction model was built via parameter-efficient QLoRA finetuning of [llama-2-70b](https://huggingface.co/meta-llama/Llama-2-70b-hf) on the first 25k rows of [ehartford/dolphin](https://huggingface.co/datasets/ehartford/dolphin) (an open-source implementation of [Microsoft's Orca](https://www.microsoft.com/en-us/research/publication/orca-progressive-learning-from-complex-explanation-traces-of-gpt-4/)). Finetuning was executed on a single H100 (80 GB PCIe) for roughly 17 hours on the [Lambda Labs](https://cloud.lambdalabs.com/instances) platform.
25
 
26
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
  ## Helpful links
28
 
29
  * Model license: Llama 2 Community License Agreement
 
32
  * Loss curves: [plot](https://huggingface.co/dfurman/Llama-2-70B-Instruct-v0.1-peft#finetuning-description)
33
  * Runtime stats: [table](https://huggingface.co/dfurman/Llama-2-70B-Instruct-v0.1-peft#runtime-tests)
34
 
35
+
36
+ ## [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
37
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_dfurman__llama-2-70b-dolphin-peft)
38
+
39
+ | Metric | Value |
40
+ |-----------------------|---------------------------|
41
+ | Avg. | 57.34 |
42
+ | ARC (25-shot) | 69.62 |
43
+ | HellaSwag (10-shot) | 86.82 |
44
+ | MMLU (5-shot) | 69.18 |
45
+ | TruthfulQA (0-shot) | 57.43 |
46
+ | Winogrande (5-shot) | 83.9 |
47
+ | GSM8K (5-shot) | 27.37 |
48
+ | DROP (3-shot) | 7.03 |
49
+
50
  ## Example prompts and responses
51
 
52
  Example 1:
 
262
  ## Framework versions
263
 
264
  - PEFT 0.5.0.dev0
 
 
265