noirchan commited on
Commit
2f756d8
·
verified ·
1 Parent(s): 9977fe4

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +27 -32
README.md CHANGED
@@ -1,44 +1,39 @@
1
  ---
 
2
  base_model:
3
- - lightblue/Karasu-DPO-7B
4
- - Qwen/Qwen2.5-Coder-7B-Instruct
5
- library_name: transformers
6
  tags:
7
- - mergekit
8
- - merge
9
-
 
 
10
  ---
11
- # dare_ties_merged_0.5
12
-
13
- This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
14
 
15
- ## Merge Details
16
- ### Merge Method
17
 
18
- This model was merged using the [DARE TIES](https://arxiv.org/abs/2311.03099) merge method using [Qwen/Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) as a base.
19
 
20
- ### Models Merged
 
 
21
 
22
- The following models were included in the merge:
23
- * [lightblue/Karasu-DPO-7B](https://huggingface.co/lightblue/Karasu-DPO-7B)
 
 
24
 
25
- ### Configuration
 
26
 
27
- The following YAML configuration was used to produce this model:
 
 
28
 
29
- ```yaml
30
- merge_method: dare_ties
31
- base_model: Qwen/Qwen2.5-Coder-7B-Instruct
32
- models:
33
- - model: Qwen/Qwen2.5-Coder-7B-Instruct
34
- parameters:
35
- weight: 0.5
36
- density: 0.5
37
- - model: lightblue/Karasu-DPO-7B
38
- parameters:
39
- weight: 0.5
40
- density: 0.5
41
- parameters:
42
- int8_mask: true
43
- dtype: bfloat16
44
  ```
 
 
 
 
1
  ---
2
+ license: apache-2.0
3
  base_model:
4
+ - Qwen/Qwen2.5-Coder-7B-Instruct
5
+ - lightblue/Karasu-DPO-7B
 
6
  tags:
7
+ - merge
8
+ - mergekit
9
+ - dare_ties
10
+ - japanese
11
+ - coding
12
  ---
 
 
 
13
 
14
+ # DARE-TIES Merged Model (Ratio: 0.5)
 
15
 
16
+ This is a merged model created using the DARE_TIES method with mergekit.
17
 
18
+ ## Base Models
19
+ - **Qwen/Qwen2.5-Coder-7B-Instruct** (Weight: 0.5)
20
+ - **lightblue/Karasu-DPO-7B** (Weight: 0.5)
21
 
22
+ ## Merge Method
23
+ - **Method**: DARE_TIES
24
+ - **Density**: 0.5
25
+ - **Data Type**: bfloat16
26
 
27
+ ## Purpose
28
+ This model aims to enhance Japanese code generation capabilities while maintaining English coding performance.
29
 
30
+ ## Usage
31
+ ```python
32
+ from transformers import AutoTokenizer, AutoModelForCausalLM
33
 
34
+ tokenizer = AutoTokenizer.from_pretrained("noirchan/DARE-TIES-Qwen2.5-Coder-Karasu-0.5")
35
+ model = AutoModelForCausalLM.from_pretrained("noirchan/DARE-TIES-Qwen2.5-Coder-Karasu-0.5")
 
 
 
 
 
 
 
 
 
 
 
 
 
36
  ```
37
+
38
+ ## Evaluation
39
+ This model is part of a systematic evaluation of different merge ratios to find the optimal balance between Japanese language capabilities and code generation performance.