metadata

base_model:
  - CultriX/SeQwence-14B-EvolMerge
  - hotmailuser/QwenSlerp2-14B
  - qingy2024/Fusion4-14B-Instruct
  - djuna/Q2.5-Veltha-14B-0.5
  - CultriX/Qwen2.5-14B-Emerged
  - allknowingroger/QwenSlerp6-14B
  - CultriX/Qwen2.5-14B-Wernickev3
  - sometimesanotion/Lamarck-14B-v0.6
library_name: transformers
tags:
  - mergekit
  - merge

merge

This is a merge of pre-trained language models created using mergekit.

Merge Details

Merge Method

This model was merged using the DARE TIES merge method using CultriX/Qwen2.5-14B-Wernickev3 as a base.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

merge_method: dare_ties  # Specifies the merge method as dare_ties, known for its high-performance potential.
base_model: CultriX/Qwen2.5-14B-Wernickev3  # Sets the base model, a strong multitask performer, for parameter alignment.
dtype: bfloat16  # Defines the data type for model weights as bfloat16, for efficient memory and computation.
out_dtype: bfloat16  # Sets the output data type to bfloat16 for consistency with the input.

parameters:
  epsilon: 0.008  # Fine-tunes parameter scaling, improving the quality of the merge and stability.
  lambda: 1.8  # Prioritizes high-impact parameters, useful for reasoning and multitask performance.
  normalize: true  # Ensures parameter normalization, preventing any instability during the merge.
  rescale: true  # Adjusts parameter scales across different models, improving compatibility.
  int8_mask: false  # Disables int8 masking, preserving full precision for better parameter alignment.

adaptive_merge_parameters:
  task_weights:  # Defines task weights, to emphasize different areas of model performance
    tinyArc: 1.6  # Sets a moderate priority for logical reasoning tasks.
    tinyHellaswag: 1.5  # Sets a medium priority for contextual understanding tasks.
    tinyMMLU: 1.8  # Gives a higher priority to multi-domain knowledge benchmarks.
    tinyTruthfulQA: 1.9  # Gives a higher priority to factual accuracy and QA tasks.
    tinyTruthfulQA_mc1: 1.75  # Slightly reduced priority, but still important, for multiple-choice reasoning.
    tinyWinogrande: 1.75  # Sets a medium priority for more complex contextual reasoning tasks.
    IFEval: 2.30  # Sets a high priority for instruction-following evaluation, as it is often a weak point for models.
    BBH: 2.05  # Gives a high priority to the big bench hard task, critical for complex reasoning.
    MATH: 2.70  # Sets the highest priority for mathematical reasoning tasks.
    GPQA: 2.20  # Gives a balanced priority to graduate-level question-answering tasks.
    MUSR: 2.15  # Gives a slightly lower, but still high, priority to multi-step reasoning tasks.
    MMLU-PRO: 2.00 # Gives a high priority to domain-specific multitask benchmark performance.
  smoothing_factor: 0.03  # Sets the smoothing factor, for a better blending of different task performance.

gradient_clipping:  # Defines gradient clipping values for each model, for training stability.
  CultriX/Qwen2.5-14B-Wernickev3: 0.89  # Sets the clipping value for the base model, a core component of the merge, and gives it a higher stability.
  djuna/Q2.5-Veltha-14B-0.5: 0.92  # Sets the clipping value for the djuna model, a strong performer in reasoning tasks.
  CultriX/SeQwence-14B-EvolMerge: 0.87  # Sets the clipping value for this model, which is a balanced multi-task performer.
  qingy2024/Fusion4-14B-Instruct: 0.93  # Sets the clipping value for this model, emphasizing stability for mathematical reasoning.
  CultriX/Qwen2.5-14B-Emerged: 0.88  # Sets the clipping value for this model, which provides multi-task support.
  sometimesanotion/Lamarck-14B-v0.6: 0.89  # Sets the clipping value for this model, to enhance the multi-step reasoning capabilities.
  allknowingroger/QwenSlerp6-14B: 0.90  # Sets the clipping value for this model, which supports nuanced reasoning tasks.
  hotmailuser/QwenSlerp2-14B: 0.91  # Sets the clipping value for this model, with slightly increased stability for logical reasoning tasks.

models:  # Defines all the models that are going to be included in the merge.
  - model: CultriX/Qwen2.5-14B-Wernickev3 # Defines the base model, the main backbone of the merge, that offers good multi-task capabilities.
    parameters: # Defines the weight and density that will be used for the model.
      weight: 0.32  # Sets the weight of the model to 0.32, which is the dominant contribution to the final model.
      density: 0.78  # Sets a high density of 0.78, to preserve its parameters, as this is a key component of the merge.

  - model: djuna/Q2.5-Veltha-14B-0.5  # Defines the djuna model, a strong performer in factual and reasoning tasks.
    parameters:  # Defines the weight and density that will be used for the model.
      weight: 0.28  # Sets the weight of the model to 0.28, to prioritize reasoning performance.
      density: 0.77  # Sets a balanced density of 0.77, to enhance its reasoning abilities.

  - model: allknowingroger/QwenSlerp6-14B  # Defines the allknowingroger model, which has good performance and reasoning capabilities.
    parameters:  # Defines the weight and density that will be used for the model.
      weight: 0.15  # Sets the weight of the model to 0.15, which has a moderate contribution to the final model.
      density: 0.72  # Sets a density of 0.72, for an effective parameter integration into the final model.

  - model: CultriX/SeQwence-14B-EvolMerge # Defines the CultriX/SeQwence model, which is a good multi-task contributor.
    parameters:  # Defines the weight and density that will be used for the model.
      weight: 0.12  # Sets the weight of the model to 0.12, a lower weight for its contribution.
      density: 0.62  # Sets a density of 0.62, for balanced performance.

  - model: qingy2024/Fusion4-14B-Instruct  # Defines the qingy model, which excels at mathematical reasoning.
    parameters:  # Defines the weight and density that will be used for the model.
      weight: 0.09  # Sets the weight of the model to 0.09, for a specific focus on mathematical reasoning tasks.
      density: 0.75  # Sets a density of 0.75, for preserving its strengths in mathematical tasks.

  - model: CultriX/Qwen2.5-14B-Emerged  # Defines the CultriX/Qwen2.5-14B-Emerged model, a good multi-task performer.
    parameters:  # Defines the weight and density that will be used for the model.
      weight: 0.08 # Sets the weight of the model to 0.08, for a smaller, but still useful, contribution.
      density: 0.69 # Sets a density of 0.69, to balance its contributions.

  - model: sometimesanotion/Lamarck-14B-v0.6 # Defines the sometimesanotion/Lamarck model, which is useful for multi-step reasoning.
    parameters:  # Defines the weight and density that will be used for the model.
      weight: 0.06 # Sets the weight of the model to 0.06, for a lower performing task model.
      density: 0.62 # Sets the density to 0.62 for multi-step reasoning tasks.

  - model: hotmailuser/QwenSlerp2-14B  # Defines the hotmailuser model, with strong performance in reasoning and multi-task performance.
    parameters:  # Defines the weight and density that will be used for the model.
      weight: 0.12  # Sets the weight of the model to 0.12, for its balanced contributions.
      density: 0.66  # Sets the density of the model to 0.66, for better parameter integration.