Magnolia-v3-medis-remix-12B

This is a merge of pre-trained language models created using mergekit. In addition to Nemo Instruct being a major component, a medical fine-tune was incorporated as a "noise" component.

Chat Template

The underlying Mistral Nemo 2407 model was tuned to work with Mistral's Tekken Instruct Chat Template, which is close to their Tokenizer V3.

{{ bos_token }}
{% for message in messages %}
    {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}
        {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
    {% endif %}
    {% if message['role'] == 'user' %}
        {{ '[INST]' + message['content'] + '[/INST]' }}
    {% elif message['role'] == 'assistant' %}
        {{ message['content'] + eos_token }}
    {% else %}
        {{ raise_exception('Only user and assistant roles are supported!') }}
    {% endif %}
{% endfor %}

Refer to Demystifying Mistral's Instruct Tokenization & Chat Templates for more details.

Merge Details

Merge Method

This model was merged using the Task Arithmetic merge method using grimjim/mistralai-Mistral-Nemo-Base-2407 as a base.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

base_model: grimjim/mistralai-Mistral-Nemo-Base-2407
dtype: bfloat16
merge_method: task_arithmetic
parameters:
  normalize: true
slices:
- sources:
  - layer_range: [0, 40]
    model: grimjim/mistralai-Mistral-Nemo-Base-2407
  - layer_range: [0, 40]
    model: grimjim/mistralai-Mistral-Nemo-Instruct-2407
    parameters:
      weight: 0.9
  - layer_range: [0, 40]
    model: grimjim/magnum-consolidatum-v1-12b
    parameters:
      weight: 0.1
  - layer_range: [0, 40]
    model: grimjim/magnum-twilight-12b
    parameters:
      weight: 0.001
  - layer_range: [0, 40]
    model: exafluence/EXF-Medistral-Nemo-12B
    parameters:
      weight: 0.000001
  - layer_range: [0, 40]
    model: nbeerbower/Mistral-Nemo-Prism-12B
    parameters:
      weight: 0.05