Files changed (1) hide show
  1. README.md +66 -53
README.md CHANGED
@@ -1,53 +1,66 @@
1
- ---
2
- base_model:
3
- - Qwen/Qwen2.5-14B-Instruct
4
- - Qwen/Qwen2.5-14B
5
- library_name: transformers
6
- tags:
7
- - mergekit
8
- - merge
9
-
10
- ---
11
- # qwenselfbaseinstruct
12
-
13
- This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
14
-
15
- Re-injected base model into instruct model in the intermediate layers while keeping input and output layers the same (sophosympatheia gradient).
16
-
17
- While this did degrade the overall score of the model compared to instruct in EQ-bench testing (76.9195 down to 73.8068),
18
- it removed its issue with misspelling some of the emotion responses and remains notably higher than the base model
19
- (60.1027 but without any syntax errors).
20
- It did throw in one non-mispelled "didn't match reference" syntax error, I presume it replaced the emotion entirely or used a similar grammatically correct one.
21
-
22
- Looking at this as research evidence, it seems like the instruct model picked up something hurting the spelling occasionally specifically in the intermediate layers?
23
-
24
- I don't know if there's any other gain from this merge compared to using one or both components, this was for curiosity.
25
- Might still be useful as more-compact merge materials if you wanted both base and instruct anyway.
26
-
27
- ## Merge Details
28
- ### Merge Method
29
-
30
- This model was merged using the SLERP merge method.
31
-
32
- ### Models Merged
33
-
34
- The following models were included in the merge:
35
- * [Qwen/Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct)
36
- * [Qwen/Qwen2.5-14B](https://huggingface.co/Qwen/Qwen2.5-14B)
37
-
38
- ### Configuration
39
-
40
- The following YAML configuration was used to produce this model:
41
-
42
- ```yaml
43
- models:
44
- - model: Qwen/Qwen2.5-14B
45
- merge_method: slerp
46
- base_model: Qwen/Qwen2.5-14B-Instruct
47
- parameters:
48
- t:
49
- - value: [0, 0, 0.3, 0.4, 0.5, 0.6, 0.5, 0.4, 0.3, 0, 0]
50
- dtype: bfloat16
51
-
52
-
53
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - Qwen/Qwen2.5-14B-Instruct
4
+ - Qwen/Qwen2.5-14B
5
+ library_name: transformers
6
+ tags:
7
+ - mergekit
8
+ - merge
9
+ language:
10
+ - zho
11
+ - eng
12
+ - fra
13
+ - spa
14
+ - por
15
+ - deu
16
+ - ita
17
+ - rus
18
+ - jpn
19
+ - kor
20
+ - vie
21
+ - tha
22
+ - ara
23
+ ---
24
+ # qwenselfbaseinstruct
25
+
26
+ This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
27
+
28
+ Re-injected base model into instruct model in the intermediate layers while keeping input and output layers the same (sophosympatheia gradient).
29
+
30
+ While this did degrade the overall score of the model compared to instruct in EQ-bench testing (76.9195 down to 73.8068),
31
+ it removed its issue with misspelling some of the emotion responses and remains notably higher than the base model
32
+ (60.1027 but without any syntax errors).
33
+ It did throw in one non-mispelled "didn't match reference" syntax error, I presume it replaced the emotion entirely or used a similar grammatically correct one.
34
+
35
+ Looking at this as research evidence, it seems like the instruct model picked up something hurting the spelling occasionally specifically in the intermediate layers?
36
+
37
+ I don't know if there's any other gain from this merge compared to using one or both components, this was for curiosity.
38
+ Might still be useful as more-compact merge materials if you wanted both base and instruct anyway.
39
+
40
+ ## Merge Details
41
+ ### Merge Method
42
+
43
+ This model was merged using the SLERP merge method.
44
+
45
+ ### Models Merged
46
+
47
+ The following models were included in the merge:
48
+ * [Qwen/Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct)
49
+ * [Qwen/Qwen2.5-14B](https://huggingface.co/Qwen/Qwen2.5-14B)
50
+
51
+ ### Configuration
52
+
53
+ The following YAML configuration was used to produce this model:
54
+
55
+ ```yaml
56
+ models:
57
+ - model: Qwen/Qwen2.5-14B
58
+ merge_method: slerp
59
+ base_model: Qwen/Qwen2.5-14B-Instruct
60
+ parameters:
61
+ t:
62
+ - value: [0, 0, 0.3, 0.4, 0.5, 0.6, 0.5, 0.4, 0.3, 0, 0]
63
+ dtype: bfloat16
64
+
65
+
66
+ ```