merge
This is a merge of pre-trained language models created using mergekit.
Merge Details
Merge Method
This model was merged using the DARE TIES merge method using CultriX/Qwen2.5-14B-Wernickev3 as a base.
Models Merged
The following models were included in the merge:
- CultriX/SeQwence-14B-EvolMerge
- hotmailuser/QwenSlerp2-14B
- qingy2024/Fusion4-14B-Instruct
- djuna/Q2.5-Veltha-14B-0.5
- CultriX/Qwen2.5-14B-Emerged
- allknowingroger/QwenSlerp6-14B
- sometimesanotion/Lamarck-14B-v0.6
Configuration
The following YAML configuration was used to produce this model:
merge_method: dare_ties # Specifies the merge method as dare_ties, known for its high-performance potential.
base_model: CultriX/Qwen2.5-14B-Wernickev3 # Sets the base model, a strong multitask performer, for parameter alignment.
dtype: bfloat16 # Defines the data type for model weights as bfloat16, for efficient memory and computation.
out_dtype: bfloat16 # Sets the output data type to bfloat16 for consistency with the input.
parameters:
epsilon: 0.008 # Fine-tunes parameter scaling, improving the quality of the merge and stability.
lambda: 1.8 # Prioritizes high-impact parameters, useful for reasoning and multitask performance.
normalize: true # Ensures parameter normalization, preventing any instability during the merge.
rescale: true # Adjusts parameter scales across different models, improving compatibility.
int8_mask: false # Disables int8 masking, preserving full precision for better parameter alignment.
adaptive_merge_parameters:
task_weights: # Defines task weights, to emphasize different areas of model performance
tinyArc: 1.6 # Sets a moderate priority for logical reasoning tasks.
tinyHellaswag: 1.5 # Sets a medium priority for contextual understanding tasks.
tinyMMLU: 1.8 # Gives a higher priority to multi-domain knowledge benchmarks.
tinyTruthfulQA: 1.9 # Gives a higher priority to factual accuracy and QA tasks.
tinyTruthfulQA_mc1: 1.75 # Slightly reduced priority, but still important, for multiple-choice reasoning.
tinyWinogrande: 1.75 # Sets a medium priority for more complex contextual reasoning tasks.
IFEval: 2.30 # Sets a high priority for instruction-following evaluation, as it is often a weak point for models.
BBH: 2.05 # Gives a high priority to the big bench hard task, critical for complex reasoning.
MATH: 2.70 # Sets the highest priority for mathematical reasoning tasks.
GPQA: 2.20 # Gives a balanced priority to graduate-level question-answering tasks.
MUSR: 2.15 # Gives a slightly lower, but still high, priority to multi-step reasoning tasks.
MMLU-PRO: 2.00 # Gives a high priority to domain-specific multitask benchmark performance.
smoothing_factor: 0.03 # Sets the smoothing factor, for a better blending of different task performance.
gradient_clipping: # Defines gradient clipping values for each model, for training stability.
CultriX/Qwen2.5-14B-Wernickev3: 0.89 # Sets the clipping value for the base model, a core component of the merge, and gives it a higher stability.
djuna/Q2.5-Veltha-14B-0.5: 0.92 # Sets the clipping value for the djuna model, a strong performer in reasoning tasks.
CultriX/SeQwence-14B-EvolMerge: 0.87 # Sets the clipping value for this model, which is a balanced multi-task performer.
qingy2024/Fusion4-14B-Instruct: 0.93 # Sets the clipping value for this model, emphasizing stability for mathematical reasoning.
CultriX/Qwen2.5-14B-Emerged: 0.88 # Sets the clipping value for this model, which provides multi-task support.
sometimesanotion/Lamarck-14B-v0.6: 0.89 # Sets the clipping value for this model, to enhance the multi-step reasoning capabilities.
allknowingroger/QwenSlerp6-14B: 0.90 # Sets the clipping value for this model, which supports nuanced reasoning tasks.
hotmailuser/QwenSlerp2-14B: 0.91 # Sets the clipping value for this model, with slightly increased stability for logical reasoning tasks.
models: # Defines all the models that are going to be included in the merge.
- model: CultriX/Qwen2.5-14B-Wernickev3 # Defines the base model, the main backbone of the merge, that offers good multi-task capabilities.
parameters: # Defines the weight and density that will be used for the model.
weight: 0.32 # Sets the weight of the model to 0.32, which is the dominant contribution to the final model.
density: 0.78 # Sets a high density of 0.78, to preserve its parameters, as this is a key component of the merge.
- model: djuna/Q2.5-Veltha-14B-0.5 # Defines the djuna model, a strong performer in factual and reasoning tasks.
parameters: # Defines the weight and density that will be used for the model.
weight: 0.28 # Sets the weight of the model to 0.28, to prioritize reasoning performance.
density: 0.77 # Sets a balanced density of 0.77, to enhance its reasoning abilities.
- model: allknowingroger/QwenSlerp6-14B # Defines the allknowingroger model, which has good performance and reasoning capabilities.
parameters: # Defines the weight and density that will be used for the model.
weight: 0.15 # Sets the weight of the model to 0.15, which has a moderate contribution to the final model.
density: 0.72 # Sets a density of 0.72, for an effective parameter integration into the final model.
- model: CultriX/SeQwence-14B-EvolMerge # Defines the CultriX/SeQwence model, which is a good multi-task contributor.
parameters: # Defines the weight and density that will be used for the model.
weight: 0.12 # Sets the weight of the model to 0.12, a lower weight for its contribution.
density: 0.62 # Sets a density of 0.62, for balanced performance.
- model: qingy2024/Fusion4-14B-Instruct # Defines the qingy model, which excels at mathematical reasoning.
parameters: # Defines the weight and density that will be used for the model.
weight: 0.09 # Sets the weight of the model to 0.09, for a specific focus on mathematical reasoning tasks.
density: 0.75 # Sets a density of 0.75, for preserving its strengths in mathematical tasks.
- model: CultriX/Qwen2.5-14B-Emerged # Defines the CultriX/Qwen2.5-14B-Emerged model, a good multi-task performer.
parameters: # Defines the weight and density that will be used for the model.
weight: 0.08 # Sets the weight of the model to 0.08, for a smaller, but still useful, contribution.
density: 0.69 # Sets a density of 0.69, to balance its contributions.
- model: sometimesanotion/Lamarck-14B-v0.6 # Defines the sometimesanotion/Lamarck model, which is useful for multi-step reasoning.
parameters: # Defines the weight and density that will be used for the model.
weight: 0.06 # Sets the weight of the model to 0.06, for a lower performing task model.
density: 0.62 # Sets the density to 0.62 for multi-step reasoning tasks.
- model: hotmailuser/QwenSlerp2-14B # Defines the hotmailuser model, with strong performance in reasoning and multi-task performance.
parameters: # Defines the weight and density that will be used for the model.
weight: 0.12 # Sets the weight of the model to 0.12, for its balanced contributions.
density: 0.66 # Sets the density of the model to 0.66, for better parameter integration.
- Downloads last month
- 26
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Model tree for CultriX/Qwen2.5-14B-Hyper
Merge model
this model