|  | --- | 
					
						
						|  | tags: | 
					
						
						|  | - merge | 
					
						
						|  | - mergekit | 
					
						
						|  | - lazymergekit | 
					
						
						|  | - flemmingmiguel/NeuDist-Ro-7B | 
					
						
						|  | - johannhartmann/Brezn3 | 
					
						
						|  | - ResplendentAI/Flora_DPO_7B | 
					
						
						|  | base_model: | 
					
						
						|  | - flemmingmiguel/NeuDist-Ro-7B | 
					
						
						|  | - johannhartmann/Brezn3 | 
					
						
						|  | - ResplendentAI/Flora_DPO_7B | 
					
						
						|  | language: | 
					
						
						|  | - de | 
					
						
						|  | - en | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | # Spaetzle-v8-7b | 
					
						
						|  |  | 
					
						
						|  | This model is supposed to show adequate performance in German and English on a number of tasks, while mostly behaving well, that is, without rambling on, intermixing tokens from different templates in training and adapting, etc. | 
					
						
						|  |  | 
					
						
						|  | It is mostly a quick test, and considerably weaker in German grammar and orthography than DiscoLM e.g., but for use cases where this is not too important, but e.g. instruction following, reasoning, etc, it might actually be a little bit preferable. | 
					
						
						|  |  | 
					
						
						|  | It is a merge of the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing): | 
					
						
						|  | * [flemmingmiguel/NeuDist-Ro-7B](https://huggingface.co/flemmingmiguel/NeuDist-Ro-7B) | 
					
						
						|  | * [johannhartmann/Brezn3](https://huggingface.co/johannhartmann/Brezn3) | 
					
						
						|  | * [ResplendentAI/Flora_DPO_7B](https://huggingface.co/ResplendentAI/Flora_DPO_7B) | 
					
						
						|  | * on the basis of [mayflowergmbh/Wiedervereinigung-7b-dpo-laser](https://huggingface.co/mayflowergmbh/Wiedervereinigung-7b-dpo-laser) | 
					
						
						|  |  | 
					
						
						|  | All credits are due to the creators of those original models and the training datasets involved. | 
					
						
						|  |  | 
					
						
						|  | For a suitable quantized version, try [cstr/Spaetzle-v8-7b-GGUF](https://huggingface.co/cstr/Spaetzle-v8-7b-GGUF) | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  | ## Evaluation | 
					
						
						|  | [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) | 
					
						
						|  | Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_cstr__Spaetzle-v8-7b) | 
					
						
						|  |  | 
					
						
						|  | |             Metric              |Value| | 
					
						
						|  | |---------------------------------|----:| | 
					
						
						|  | |Avg.                             |72.27| | 
					
						
						|  | |AI2 Reasoning Challenge (25-Shot)|68.69| | 
					
						
						|  | |HellaSwag (10-Shot)              |86.68| | 
					
						
						|  | |MMLU (5-Shot)                    |64.60| | 
					
						
						|  | |TruthfulQA (0-shot)              |64.05| | 
					
						
						|  | |Winogrande (5-shot)              |81.45| | 
					
						
						|  | |GSM8k (5-shot)                   |68.16| | 
					
						
						|  |  | 
					
						
						|  | EQ-Bench (v2_de): 61.04 / english (v2): 78.3 | 
					
						
						|  |  | 
					
						
						|  | |                           Model                            |AGIEval|GPT4All|TruthfulQA|Bigbench|Average| | 
					
						
						|  | |------------------------------------------------------------|------:|------:|---------:|-------:|------:| | 
					
						
						|  | |[Spaetzle-v8-7b](https://huggingface.co/cstr/Spaetzle-v8-7b)|  45.31|  75.69|     63.94|   45.57|  57.63| | 
					
						
						|  |  | 
					
						
						|  | ### AGIEval | 
					
						
						|  | |             Task             |Version| Metric |Value|   |Stderr| | 
					
						
						|  | |------------------------------|------:|--------|----:|---|-----:| | 
					
						
						|  | |agieval_aqua_rat              |      0|acc     |25.59|±  |  2.74| | 
					
						
						|  | |                              |       |acc_norm|24.80|±  |  2.72| | 
					
						
						|  | |agieval_logiqa_en             |      0|acc     |39.63|±  |  1.92| | 
					
						
						|  | |                              |       |acc_norm|39.78|±  |  1.92| | 
					
						
						|  | |agieval_lsat_ar               |      0|acc     |23.48|±  |  2.80| | 
					
						
						|  | |                              |       |acc_norm|24.35|±  |  2.84| | 
					
						
						|  | |agieval_lsat_lr               |      0|acc     |50.98|±  |  2.22| | 
					
						
						|  | |                              |       |acc_norm|51.96|±  |  2.21| | 
					
						
						|  | |agieval_lsat_rc               |      0|acc     |62.08|±  |  2.96| | 
					
						
						|  | |                              |       |acc_norm|62.83|±  |  2.95| | 
					
						
						|  | |agieval_sat_en                |      0|acc     |78.64|±  |  2.86| | 
					
						
						|  | |                              |       |acc_norm|79.13|±  |  2.84| | 
					
						
						|  | |agieval_sat_en_without_passage|      0|acc     |44.66|±  |  3.47| | 
					
						
						|  | |                              |       |acc_norm|44.66|±  |  3.47| | 
					
						
						|  | |agieval_sat_math              |      0|acc     |37.27|±  |  3.27| | 
					
						
						|  | |                              |       |acc_norm|35.00|±  |  3.22| | 
					
						
						|  |  | 
					
						
						|  | Average: 45.31% | 
					
						
						|  |  | 
					
						
						|  | ### GPT4All | 
					
						
						|  | |    Task     |Version| Metric |Value|   |Stderr| | 
					
						
						|  | |-------------|------:|--------|----:|---|-----:| | 
					
						
						|  | |arc_challenge|      0|acc     |63.14|±  |  1.41| | 
					
						
						|  | |             |       |acc_norm|64.51|±  |  1.40| | 
					
						
						|  | |arc_easy     |      0|acc     |85.98|±  |  0.71| | 
					
						
						|  | |             |       |acc_norm|82.49|±  |  0.78| | 
					
						
						|  | |boolq        |      1|acc     |88.10|±  |  0.57| | 
					
						
						|  | |hellaswag    |      0|acc     |66.31|±  |  0.47| | 
					
						
						|  | |             |       |acc_norm|85.17|±  |  0.35| | 
					
						
						|  | |openbookqa   |      0|acc     |38.00|±  |  2.17| | 
					
						
						|  | |             |       |acc_norm|47.20|±  |  2.23| | 
					
						
						|  | |piqa         |      0|acc     |83.35|±  |  0.87| | 
					
						
						|  | |             |       |acc_norm|84.17|±  |  0.85| | 
					
						
						|  | |winogrande   |      0|acc     |78.22|±  |  1.16| | 
					
						
						|  |  | 
					
						
						|  | Average: 75.69% | 
					
						
						|  |  | 
					
						
						|  | ### TruthfulQA | 
					
						
						|  | |    Task     |Version|Metric|Value|   |Stderr| | 
					
						
						|  | |-------------|------:|------|----:|---|-----:| | 
					
						
						|  | |truthfulqa_mc|      1|mc1   |47.74|±  |  1.75| | 
					
						
						|  | |             |       |mc2   |63.94|±  |  1.53| | 
					
						
						|  |  | 
					
						
						|  | Average: 63.94% | 
					
						
						|  |  | 
					
						
						|  | ### Bigbench | 
					
						
						|  | |                      Task                      |Version|       Metric        |Value|   |Stderr| | 
					
						
						|  | |------------------------------------------------|------:|---------------------|----:|---|-----:| | 
					
						
						|  | |bigbench_causal_judgement                       |      0|multiple_choice_grade|56.84|±  |  3.60| | 
					
						
						|  | |bigbench_date_understanding                     |      0|multiple_choice_grade|66.12|±  |  2.47| | 
					
						
						|  | |bigbench_disambiguation_qa                      |      0|multiple_choice_grade|41.47|±  |  3.07| | 
					
						
						|  | |bigbench_geometric_shapes                       |      0|multiple_choice_grade|22.01|±  |  2.19| | 
					
						
						|  | |                                                |       |exact_str_match      | 0.00|±  |  0.00| | 
					
						
						|  | |bigbench_logical_deduction_five_objects         |      0|multiple_choice_grade|31.40|±  |  2.08| | 
					
						
						|  | |bigbench_logical_deduction_seven_objects        |      0|multiple_choice_grade|23.14|±  |  1.60| | 
					
						
						|  | |bigbench_logical_deduction_three_objects        |      0|multiple_choice_grade|56.00|±  |  2.87| | 
					
						
						|  | |bigbench_movie_recommendation                   |      0|multiple_choice_grade|45.00|±  |  2.23| | 
					
						
						|  | |bigbench_navigate                               |      0|multiple_choice_grade|50.70|±  |  1.58| | 
					
						
						|  | |bigbench_reasoning_about_colored_objects        |      0|multiple_choice_grade|70.05|±  |  1.02| | 
					
						
						|  | |bigbench_ruin_names                             |      0|multiple_choice_grade|45.54|±  |  2.36| | 
					
						
						|  | |bigbench_salient_translation_error_detection    |      0|multiple_choice_grade|26.05|±  |  1.39| | 
					
						
						|  | |bigbench_snarks                                 |      0|multiple_choice_grade|71.82|±  |  3.35| | 
					
						
						|  | |bigbench_sports_understanding                   |      0|multiple_choice_grade|72.92|±  |  1.42| | 
					
						
						|  | |bigbench_temporal_sequences                     |      0|multiple_choice_grade|44.20|±  |  1.57| | 
					
						
						|  | |bigbench_tracking_shuffled_objects_five_objects |      0|multiple_choice_grade|22.80|±  |  1.19| | 
					
						
						|  | |bigbench_tracking_shuffled_objects_seven_objects|      0|multiple_choice_grade|18.23|±  |  0.92| | 
					
						
						|  | |bigbench_tracking_shuffled_objects_three_objects|      0|multiple_choice_grade|56.00|±  |  2.87| | 
					
						
						|  |  | 
					
						
						|  | Average: 45.57% | 
					
						
						|  |  | 
					
						
						|  | Average score: 57.63% | 
					
						
						|  |  | 
					
						
						|  | ## 💻 Usage | 
					
						
						|  |  | 
					
						
						|  | ```python | 
					
						
						|  | !pip install -qU transformers accelerate | 
					
						
						|  |  | 
					
						
						|  | from transformers import AutoTokenizer | 
					
						
						|  | import transformers | 
					
						
						|  | import torch | 
					
						
						|  |  | 
					
						
						|  | model = "cstr/Spaetzle-v8-7b" | 
					
						
						|  | messages = [{"role": "user", "content": "What is a large language model?"}] | 
					
						
						|  |  | 
					
						
						|  | tokenizer = AutoTokenizer.from_pretrained(model) | 
					
						
						|  | prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) | 
					
						
						|  | pipeline = transformers.pipeline( | 
					
						
						|  | "text-generation", | 
					
						
						|  | model=model, | 
					
						
						|  | torch_dtype=torch.float16, | 
					
						
						|  | device_map="auto", | 
					
						
						|  | ) | 
					
						
						|  |  | 
					
						
						|  | outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95) | 
					
						
						|  | print(outputs[0]["generated_text"]) | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  | ## 🧩 Configuration | 
					
						
						|  |  | 
					
						
						|  | The model uses ChatML and should work well with this (as it is merged from models which (mostly) saw ChatML templates in training). | 
					
						
						|  |  | 
					
						
						|  | ```yaml | 
					
						
						|  | models: | 
					
						
						|  | - model: mayflowergmbh/Wiedervereinigung-7b-dpo-laser | 
					
						
						|  | # no parameters necessary for base model | 
					
						
						|  | - model: flemmingmiguel/NeuDist-Ro-7B | 
					
						
						|  | parameters: | 
					
						
						|  | density: 0.60 | 
					
						
						|  | weight: 0.30 | 
					
						
						|  | - model: johannhartmann/Brezn3 | 
					
						
						|  | parameters: | 
					
						
						|  | density: 0.65 | 
					
						
						|  | weight: 0.40 | 
					
						
						|  | - model: ResplendentAI/Flora_DPO_7B | 
					
						
						|  | parameters: | 
					
						
						|  | density: 0.6 | 
					
						
						|  | weight: 0.3 | 
					
						
						|  | merge_method: dare_ties | 
					
						
						|  | base_model: mayflowergmbh/Wiedervereinigung-7b-dpo-laser | 
					
						
						|  | parameters: | 
					
						
						|  | int8_mask: true | 
					
						
						|  | dtype: bfloat16 | 
					
						
						|  | random_seed: 0 | 
					
						
						|  | tokenizer_source: base | 
					
						
						|  | ``` |