Spaetzle-v12-7b

Spaetzle-v12-7b is a merge of the following models using LazyMergekit:

As expected, this is a little bit worse in general English tasks over cstr/spaetzle-v8-7b, but a tiny little bit better on German tasks, at least some: e.g. it reaches an EQ-Bench (de) score of 64.81, but only

Metric Value
Avg. 69.36
AI2 Reasoning Challenge (25-Shot) 65.96
HellaSwag (10-Shot) 86.16
MMLU (5-Shot) 63.48
TruthfulQA (0-shot) 57.84
Winogrande (5-shot) 80.03
GSM8k (5-shot) 62.70
Model AGIEval GPT4All TruthfulQA Bigbench Average
Spaetzle-v12-7b 42.64 74.3 58.44 44.44 54.95

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 24.02 Β± 2.69
acc_norm 21.65 Β± 2.59
agieval_logiqa_en 0 acc 36.10 Β± 1.88
acc_norm 37.63 Β± 1.90
agieval_lsat_ar 0 acc 24.35 Β± 2.84
acc_norm 23.04 Β± 2.78
agieval_lsat_lr 0 acc 48.82 Β± 2.22
acc_norm 47.25 Β± 2.21
agieval_lsat_rc 0 acc 60.59 Β± 2.98
acc_norm 57.99 Β± 3.01
agieval_sat_en 0 acc 76.21 Β± 2.97
acc_norm 74.76 Β± 3.03
agieval_sat_en_without_passage 0 acc 46.60 Β± 3.48
acc_norm 45.63 Β± 3.48
agieval_sat_math 0 acc 37.27 Β± 3.27
acc_norm 33.18 Β± 3.18

Average: 42.64%

GPT4All

Task Version Metric Value Stderr
arc_challenge 0 acc 59.13 Β± 1.44
acc_norm 61.26 Β± 1.42
arc_easy 0 acc 83.67 Β± 0.76
acc_norm 80.89 Β± 0.81
boolq 1 acc 87.83 Β± 0.57
hellaswag 0 acc 66.45 Β± 0.47
acc_norm 84.63 Β± 0.36
openbookqa 0 acc 37.40 Β± 2.17
acc_norm 45.80 Β± 2.23
piqa 0 acc 82.15 Β± 0.89
acc_norm 83.13 Β± 0.87
winogrande 0 acc 76.56 Β± 1.19

Average: 74.3%

TruthfulQA

Task Version Metric Value Stderr
truthfulqa_mc 1 mc1 42.59 Β± 1.73
mc2 58.44 Β± 1.58

Average: 58.44%

Bigbench

Task Version Metric Value Stderr
bigbench_causal_judgement 0 multiple_choice_grade 55.26 Β± 3.62
bigbench_date_understanding 0 multiple_choice_grade 64.77 Β± 2.49
bigbench_disambiguation_qa 0 multiple_choice_grade 37.60 Β± 3.02
bigbench_geometric_shapes 0 multiple_choice_grade 32.31 Β± 2.47
exact_str_match 21.45 Β± 2.17
bigbench_logical_deduction_five_objects 0 multiple_choice_grade 31.00 Β± 2.07
bigbench_logical_deduction_seven_objects 0 multiple_choice_grade 22.43 Β± 1.58
bigbench_logical_deduction_three_objects 0 multiple_choice_grade 53.00 Β± 2.89
bigbench_movie_recommendation 0 multiple_choice_grade 40.40 Β± 2.20
bigbench_navigate 0 multiple_choice_grade 51.30 Β± 1.58
bigbench_reasoning_about_colored_objects 0 multiple_choice_grade 68.50 Β± 1.04
bigbench_ruin_names 0 multiple_choice_grade 48.66 Β± 2.36
bigbench_salient_translation_error_detection 0 multiple_choice_grade 30.36 Β± 1.46
bigbench_snarks 0 multiple_choice_grade 70.17 Β± 3.41
bigbench_sports_understanding 0 multiple_choice_grade 70.39 Β± 1.45
bigbench_temporal_sequences 0 multiple_choice_grade 31.00 Β± 1.46
bigbench_tracking_shuffled_objects_five_objects 0 multiple_choice_grade 21.44 Β± 1.16
bigbench_tracking_shuffled_objects_seven_objects 0 multiple_choice_grade 18.29 Β± 0.92
bigbench_tracking_shuffled_objects_three_objects 0 multiple_choice_grade 53.00 Β± 2.89

Average: 44.44%

Average score: 54.95%

Elapsed time: 02:50:51

🧩 Configuration

models:
  - model: mayflowergmbh/Wiedervereinigung-7b-dpo-laser
    # no parameters necessary for base model
  - model: flemmingmiguel/NeuDist-Ro-7B
    parameters:
      density: 0.60
      weight: 0.30
  - model: Blizado/discolm-mfto-7b-german-v0.1
    parameters:
      density: 0.65
      weight: 0.40
  - model: ResplendentAI/Flora_DPO_7B
    parameters:
      density: 0.6
      weight: 0.3
merge_method: dare_ties
base_model: mayflowergmbh/Wiedervereinigung-7b-dpo-laser
parameters:
  int8_mask: true
dtype: bfloat16
random_seed: 0
tokenizer_source: base

πŸ’» Usage

!pip install -qU transformers accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "cstr/Spaetzle-v12-7b"
messages = [{"role": "user", "content": "What is a large language model?"}]

tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
Downloads last month
14
Safetensors
Model size
7.24B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for cstr/Spaetzle-v12-7b

Spaces using cstr/Spaetzle-v12-7b 8

Collection including cstr/Spaetzle-v12-7b