--- base_model: - CultriX/SeQwence-14B-EvolMerge - hotmailuser/QwenSlerp2-14B - qingy2024/Fusion4-14B-Instruct - djuna/Q2.5-Veltha-14B-0.5 - CultriX/Qwen2.5-14B-Emerged - allknowingroger/QwenSlerp6-14B - CultriX/Qwen2.5-14B-Wernickev3 - sometimesanotion/Lamarck-14B-v0.6 library_name: transformers tags: - mergekit - merge --- # merge This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit). ## Merge Details ### Merge Method This model was merged using the [DARE](https://arxiv.org/abs/2311.03099) [TIES](https://arxiv.org/abs/2306.01708) merge method using [CultriX/Qwen2.5-14B-Wernickev3](https://huggingface.co/CultriX/Qwen2.5-14B-Wernickev3) as a base. ### Models Merged The following models were included in the merge: * [CultriX/SeQwence-14B-EvolMerge](https://huggingface.co/CultriX/SeQwence-14B-EvolMerge) * [hotmailuser/QwenSlerp2-14B](https://huggingface.co/hotmailuser/QwenSlerp2-14B) * [qingy2024/Fusion4-14B-Instruct](https://huggingface.co/qingy2024/Fusion4-14B-Instruct) * [djuna/Q2.5-Veltha-14B-0.5](https://huggingface.co/djuna/Q2.5-Veltha-14B-0.5) * [CultriX/Qwen2.5-14B-Emerged](https://huggingface.co/CultriX/Qwen2.5-14B-Emerged) * [allknowingroger/QwenSlerp6-14B](https://huggingface.co/allknowingroger/QwenSlerp6-14B) * [sometimesanotion/Lamarck-14B-v0.6](https://huggingface.co/sometimesanotion/Lamarck-14B-v0.6) ### Configuration The following YAML configuration was used to produce this model: ```yaml merge_method: dare_ties # Specifies the merge method as dare_ties, known for its high-performance potential. base_model: CultriX/Qwen2.5-14B-Wernickev3 # Sets the base model, a strong multitask performer, for parameter alignment. dtype: bfloat16 # Defines the data type for model weights as bfloat16, for efficient memory and computation. out_dtype: bfloat16 # Sets the output data type to bfloat16 for consistency with the input. parameters: epsilon: 0.008 # Fine-tunes parameter scaling, improving the quality of the merge and stability. lambda: 1.8 # Prioritizes high-impact parameters, useful for reasoning and multitask performance. normalize: true # Ensures parameter normalization, preventing any instability during the merge. rescale: true # Adjusts parameter scales across different models, improving compatibility. int8_mask: false # Disables int8 masking, preserving full precision for better parameter alignment. adaptive_merge_parameters: task_weights: # Defines task weights, to emphasize different areas of model performance tinyArc: 1.6 # Sets a moderate priority for logical reasoning tasks. tinyHellaswag: 1.5 # Sets a medium priority for contextual understanding tasks. tinyMMLU: 1.8 # Gives a higher priority to multi-domain knowledge benchmarks. tinyTruthfulQA: 1.9 # Gives a higher priority to factual accuracy and QA tasks. tinyTruthfulQA_mc1: 1.75 # Slightly reduced priority, but still important, for multiple-choice reasoning. tinyWinogrande: 1.75 # Sets a medium priority for more complex contextual reasoning tasks. IFEval: 2.30 # Sets a high priority for instruction-following evaluation, as it is often a weak point for models. BBH: 2.05 # Gives a high priority to the big bench hard task, critical for complex reasoning. MATH: 2.70 # Sets the highest priority for mathematical reasoning tasks. GPQA: 2.20 # Gives a balanced priority to graduate-level question-answering tasks. MUSR: 2.15 # Gives a slightly lower, but still high, priority to multi-step reasoning tasks. MMLU-PRO: 2.00 # Gives a high priority to domain-specific multitask benchmark performance. smoothing_factor: 0.03 # Sets the smoothing factor, for a better blending of different task performance. gradient_clipping: # Defines gradient clipping values for each model, for training stability. CultriX/Qwen2.5-14B-Wernickev3: 0.89 # Sets the clipping value for the base model, a core component of the merge, and gives it a higher stability. djuna/Q2.5-Veltha-14B-0.5: 0.92 # Sets the clipping value for the djuna model, a strong performer in reasoning tasks. CultriX/SeQwence-14B-EvolMerge: 0.87 # Sets the clipping value for this model, which is a balanced multi-task performer. qingy2024/Fusion4-14B-Instruct: 0.93 # Sets the clipping value for this model, emphasizing stability for mathematical reasoning. CultriX/Qwen2.5-14B-Emerged: 0.88 # Sets the clipping value for this model, which provides multi-task support. sometimesanotion/Lamarck-14B-v0.6: 0.89 # Sets the clipping value for this model, to enhance the multi-step reasoning capabilities. allknowingroger/QwenSlerp6-14B: 0.90 # Sets the clipping value for this model, which supports nuanced reasoning tasks. hotmailuser/QwenSlerp2-14B: 0.91 # Sets the clipping value for this model, with slightly increased stability for logical reasoning tasks. models: # Defines all the models that are going to be included in the merge. - model: CultriX/Qwen2.5-14B-Wernickev3 # Defines the base model, the main backbone of the merge, that offers good multi-task capabilities. parameters: # Defines the weight and density that will be used for the model. weight: 0.32 # Sets the weight of the model to 0.32, which is the dominant contribution to the final model. density: 0.78 # Sets a high density of 0.78, to preserve its parameters, as this is a key component of the merge. - model: djuna/Q2.5-Veltha-14B-0.5 # Defines the djuna model, a strong performer in factual and reasoning tasks. parameters: # Defines the weight and density that will be used for the model. weight: 0.28 # Sets the weight of the model to 0.28, to prioritize reasoning performance. density: 0.77 # Sets a balanced density of 0.77, to enhance its reasoning abilities. - model: allknowingroger/QwenSlerp6-14B # Defines the allknowingroger model, which has good performance and reasoning capabilities. parameters: # Defines the weight and density that will be used for the model. weight: 0.15 # Sets the weight of the model to 0.15, which has a moderate contribution to the final model. density: 0.72 # Sets a density of 0.72, for an effective parameter integration into the final model. - model: CultriX/SeQwence-14B-EvolMerge # Defines the CultriX/SeQwence model, which is a good multi-task contributor. parameters: # Defines the weight and density that will be used for the model. weight: 0.12 # Sets the weight of the model to 0.12, a lower weight for its contribution. density: 0.62 # Sets a density of 0.62, for balanced performance. - model: qingy2024/Fusion4-14B-Instruct # Defines the qingy model, which excels at mathematical reasoning. parameters: # Defines the weight and density that will be used for the model. weight: 0.09 # Sets the weight of the model to 0.09, for a specific focus on mathematical reasoning tasks. density: 0.75 # Sets a density of 0.75, for preserving its strengths in mathematical tasks. - model: CultriX/Qwen2.5-14B-Emerged # Defines the CultriX/Qwen2.5-14B-Emerged model, a good multi-task performer. parameters: # Defines the weight and density that will be used for the model. weight: 0.08 # Sets the weight of the model to 0.08, for a smaller, but still useful, contribution. density: 0.69 # Sets a density of 0.69, to balance its contributions. - model: sometimesanotion/Lamarck-14B-v0.6 # Defines the sometimesanotion/Lamarck model, which is useful for multi-step reasoning. parameters: # Defines the weight and density that will be used for the model. weight: 0.06 # Sets the weight of the model to 0.06, for a lower performing task model. density: 0.62 # Sets the density to 0.62 for multi-step reasoning tasks. - model: hotmailuser/QwenSlerp2-14B # Defines the hotmailuser model, with strong performance in reasoning and multi-task performance. parameters: # Defines the weight and density that will be used for the model. weight: 0.12 # Sets the weight of the model to 0.12, for its balanced contributions. density: 0.66 # Sets the density of the model to 0.66, for better parameter integration. ```