about
Successor of/alternative to Smartricks V1.01. Smart, makes long answers, with an agreeable and quite diferent prose compared to v1.01.
Lead model is once again Drummer's Fallen Llama R1. Side model is Nemotron. Base is L3.3 abliterated. And additions are Tulu Abliterated and Hitachi FLDx2.
Comment
Why do I merge so many quite-so-similar models? For a simple reason. I want to have the prose quality of an instruct model with the perplexity of a base. It seems counter-intuitive, because the higher the "instruct level" is, the more we drift away from the pile. I noticed quite clearly the different between the finetunes made on a base Llama, and on an instruct Llama.
The PPL 3.5+ zone allows usually a great prose quality, without an excessive "flatness" of the produced text. The 3.10-3.50 zone is where the margin of progress is as of now, from my experimentations. The 2.80-3.10 zone is close to base, and I observe enumerative patterns with a high degree of repetition, and I'd like to break in that zone with viable models for any kind of prose.
benchs
PPL 512 Wikitext eng : 3.46 ; ARC-C : 60.55 ; ARC-E : 82.10
merge
This is a merge of pre-trained language models created using mergekit.
Merge Details
Merge Method
This model was merged using the Model Stock merge method using Nexesenex/Llama_3.x_70b_SmarTricks_0.11 as a base.
Models Merged
The following models were included in the merge:
Configuration
The following YAML configuration was used to produce this model:
merge_method: model_stock
models:
- model: NexesMess/Llama_3.x_70b_SmarTricks_0.21_NMT
parameters:
weight: 1.0
- model: Nexesenex/Llama_3.x_70b_SmarTricks_0.41_R1
parameters:
weight: 1.0
base_model: Nexesenex/Llama_3.x_70b_SmarTricks_0.11
dtype: bfloat16
out_dtype: bfloat16
parameters:
int8_mask: true
normalize: true
rescale: false
filter_wise: false
smooth: false
allow_negative_weights: false
chat_template: auto
tokenizer:
source: union
- Downloads last month
- 8