DARE-TIES Merged Model (Ratio: 0.7)

This is a merged model created using the DARE_TIES method with mergekit.

Base Models

Qwen/Qwen2.5-Coder-7B-Instruct (Weight: 0.3)
lightblue/Karasu-DPO-7B (Weight: 0.7)

Merge Method

Method: DARE_TIES
Density: 0.5
Data Type: bfloat16

Purpose

This model aims to enhance Japanese code generation capabilities while maintaining English coding performance.

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("noirchan/DARE-TIES-Qwen2.5-Coder-Karasu-0.7")
model = AutoModelForCausalLM.from_pretrained("noirchan/DARE-TIES-Qwen2.5-Coder-Karasu-0.7")

Evaluation

This model is part of a systematic evaluation of different merge ratios to find the optimal balance between Japanese language capabilities and code generation performance.

Downloads last month: 9

Safetensors

Model size

7.62B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for noirchan/DARE-TIES-Qwen2.5-Coder-Karasu-0.7

Qwen/Qwen2.5-Coder-7B-Instruct

lightblue/Karasu-DPO-7B

Merge model

this model