ymcki commited on
Commit
daa5e65
·
verified ·
1 Parent(s): b58516b

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +97 -3
README.md CHANGED
@@ -1,3 +1,97 @@
1
- ---
2
- license: llama3.1
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: meta-llama/Llama-3.1-8B-Instruct
3
+ language:
4
+ - multilingual
5
+ datasets:
6
+ - cognitivecomputations/dolphin-r1
7
+ - openai/gsm8k
8
+ library_name: transformers
9
+ license: llama3.1
10
+ license_link: https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE
11
+ pipeline_tag: text-generation
12
+ tags:
13
+ - nlp
14
+ - code
15
+ quantized_by: ymcki
16
+ widget:
17
+ - messages:
18
+ - role: user
19
+ content: Can you provide ways to eat combinations of bananas and dragonfruits?
20
+ ---
21
+
22
+ Original model: https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct
23
+
24
+ ## Prompt format
25
+
26
+ ```
27
+ <|begin_of_text|><|start_header_id|>system<|end_header_id|>
28
+ Cutting Knowledge Date: December 2023
29
+ Today Date: 26 July 2024
30
+ {system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>
31
+ {prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
32
+
33
+ ```
34
+
35
+ By following the same procedure of Deepseek R1, [SFT](https://techcommunity.microsoft.com/blog/machinelearningblog/distillation-of-phi-4-on-deepseek-r1-sft-and-grpo/4381697) with Cognitive Computations' dolphin-r1 was performed first and then followed by Group Relative Policy Optimization (GRPO) with OpenAI gsm8k dataset. Two adapters are obtained and were applied to Llama-3.1-8B-Instruct to see if Reasoning and Math can be further improved.
36
+
37
+ One epoch was run for the GRPO run. High reward average score for the last 53 steps was recorded at 0.96 epoch. The adapter is then applied to Llama-3.1-8B-Instruct.
38
+
39
+ | Epoch | reward/format | reward/correct | reward/total |
40
+ | ----- | ------------- | -------------- | ------------ |
41
+ | 0.52 | 0.469783 | 1.27358 | 1.74337 |
42
+ | 0.96 | 0.750012 | 1.10613 | 1.85614 |
43
+ | 1.00 | 0.747508 | 1.05425 | 1.80175 |
44
+
45
+ This model is uploaded here to be evaluated by the Open LLM Leaderboard. Further GRPO fine tuning is currently underway to see further improvement is possible.
46
+
47
+ ## Benchmark (100.0*raw scores only)
48
+
49
+ Click on the model name go to the raw score json generated by Open LLM Leaderboard.
50
+
51
+ | Model | Average | IFEval | BHH | Math Lv5 | GPQA | MUSR | MMLU-PRO |
52
+ | ----- | ------- | ------ | ----|--------- | ---- | ---- | -------- |
53
+ | [Llama-3.1-8B-Instruct](https://huggingface.co/datasets/open-llm-leaderboard/results/raw/main/meta-llama/Meta-Llama-3.1-8B-Instruct/results_2024-10-24T00-00-00.000000.json) | 42.24 | 80.48 | 50.62 | 19.34 | 26.76 | 38.62 | 37.62 |
54
+ | [Llama-3.1-8B-GRPO-Instruct](https://huggingface.co/datasets/open-llm-leaderboard/results/raw/main/ymcki/Llama-3.1-8B-GRPO-Instruct/results_2025-02-24T17-37-02.760485.json) | 42.00 | 75.61 | 51.21 | 20.24 | 29.45 | 38.10 | 37.38 |
55
+ | Llama-3.1-8B-SFT-GRPO-Instruct | | | | | | | |
56
+ Gain in reasoning and math is offset by instruction following.
57
+
58
+ ## How to run this model
59
+
60
+ ```py
61
+ from transformers import AutoTokenizer, AutoModelForCausalLM
62
+ import transformers
63
+ import torch
64
+
65
+ model_id = "Llama-3.1-8B-SFT-GRPO-Instruct"
66
+ dtype = torch.bfloat16
67
+
68
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
69
+ model = AutoModelForCausalLM.from_pretrained(
70
+ model_id,
71
+ device_map="cuda",
72
+ torch_dtype=dtype,)
73
+
74
+ chat = [
75
+ { "role": "user", "content": "Write a hello world program" },
76
+ ]
77
+ prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
78
+ ```
79
+
80
+ ## Downloading using huggingface-cli
81
+
82
+ First, make sure you have hugginface-cli installed:
83
+
84
+ ```
85
+ pip install -U "huggingface_hub[cli]"
86
+ ```
87
+
88
+ Then, you can target the specific file you want:
89
+
90
+ ```
91
+ huggingface-cli download ymcki/Llama-3.1-8B-SFT-GRPO-Instruct --include "*" --local-dir ./
92
+ ```
93
+
94
+ ## Credits
95
+
96
+ Thanks Deepseek to develop the original GRPO method.
97
+