about

Successor of/alternative to Smartricks V1.01. Smart, makes long answers, with an agreeable and quite diferent prose compared to v1.01.

Lead model is once again Drummer's Fallen Llama R1. Side model is Nemotron. Base is L3.3 abliterated. And additions are Tulu Abliterated and Hitachi FLDx2.

Comment

Why do I merge so many quite-so-similar models? For a simple reason. I want to have the prose quality of an instruct model with the perplexity of a base. It seems counter-intuitive, because the higher the "instruct level" is, the more we drift away from the pile. I noticed quite clearly the different between the finetunes made on a base Llama, and on an instruct Llama.

The PPL 3.5+ zone allows usually a great prose quality, without an excessive "flatness" of the produced text. The 3.10-3.50 zone is where the margin of progress is as of now, from my experimentations. The 2.80-3.10 zone is close to base, and I observe enumerative patterns with a high degree of repetition, and I'd like to break in that zone with viable models for any kind of prose.

benchs

PPL 512 Wikitext eng : 3.46 ; ARC-C : 60.55 ; ARC-E : 82.10

merge

This is a merge of pre-trained language models created using mergekit.

Merge Details

Merge Method

This model was merged using the Model Stock merge method using Nexesenex/Llama_3.x_70b_SmarTricks_0.11 as a base.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

merge_method: model_stock
models:
  - model: NexesMess/Llama_3.x_70b_SmarTricks_0.21_NMT
    parameters:
      weight: 1.0
  - model: Nexesenex/Llama_3.x_70b_SmarTricks_0.41_R1
    parameters:
      weight: 1.0
base_model: Nexesenex/Llama_3.x_70b_SmarTricks_0.11
dtype: bfloat16
out_dtype: bfloat16
parameters:
  int8_mask: true
  normalize: true
  rescale: false
  filter_wise: false
  smooth: false
  allow_negative_weights: false
chat_template: auto
tokenizer:
  source: union

NexesMess
/

Llama_3.x_70b_SmarTricks_V1.30