noirchan
/

DARE-TIES-Qwen2.5-Coder-Karasu-0.5

@@ -1,44 +1,39 @@
 ---
 base_model:
-- lightblue/Karasu-DPO-7B
-- Qwen/Qwen2.5-Coder-7B-Instruct
-library_name: transformers
 tags:
-- mergekit
-- merge
 ---
-# dare_ties_merged_0.5
-This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
-## Merge Details
-### Merge Method
-This model was merged using the [DARE TIES](https://arxiv.org/abs/2311.03099) merge method using [Qwen/Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) as a base.
-### Models Merged
-The following models were included in the merge:
-* [lightblue/Karasu-DPO-7B](https://huggingface.co/lightblue/Karasu-DPO-7B)
-### Configuration
-The following YAML configuration was used to produce this model:
-```yaml
-merge_method: dare_ties
-base_model: Qwen/Qwen2.5-Coder-7B-Instruct
-models:
-  - model: Qwen/Qwen2.5-Coder-7B-Instruct
-    parameters:
-      weight: 0.5
-      density: 0.5
-  - model: lightblue/Karasu-DPO-7B
-    parameters:
-      weight: 0.5
-      density: 0.5
-parameters:
-  int8_mask: true
-dtype: bfloat16
 ```

 ---
+license: apache-2.0
 base_model:
+  - Qwen/Qwen2.5-Coder-7B-Instruct
+  - lightblue/Karasu-DPO-7B
 tags:
+  - merge
+  - mergekit
+  - dare_ties
+  - japanese
+  - coding
 ---
+# DARE-TIES Merged Model (Ratio: 0.5)
+This is a merged model created using the DARE_TIES method with mergekit.
+## Base Models
+- **Qwen/Qwen2.5-Coder-7B-Instruct** (Weight: 0.5)
+- **lightblue/Karasu-DPO-7B** (Weight: 0.5)
+## Merge Method
+- **Method**: DARE_TIES
+- **Density**: 0.5
+- **Data Type**: bfloat16
+## Purpose
+This model aims to enhance Japanese code generation capabilities while maintaining English coding performance.
+## Usage
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+tokenizer = AutoTokenizer.from_pretrained("noirchan/DARE-TIES-Qwen2.5-Coder-Karasu-0.5")
+model = AutoModelForCausalLM.from_pretrained("noirchan/DARE-TIES-Qwen2.5-Coder-Karasu-0.5")
 ```
+## Evaluation
+This model is part of a systematic evaluation of different merge ratios to find the optimal balance between Japanese language capabilities and code generation performance.