license: apache-2.0
language:
- en
- zh
base_model:
- mistralai/Mistral-Small-24B-Base-2501
tags:
- axolotl
datasets:
- allenai/tulu-3-sft-personas-instruction-following
- simplescaling/s1K-1.1
- simplescaling/s1K-claude-3-7-sonnet
- reedmayhew/medical-o1-reasoning-SFT-jsonl
- OpenCoder-LLM/opc-sft-stage1
- PocketDoc/Dans-Kinomaxx-VanillaBackrooms
- cognitivecomputations/SystemChat-2.0
- anthracite-org/kalo-opus-instruct-22k-no-refusal
- allura-org/scienceqa_sharegpt
Sisyphus 24b
Hundreds of dollars later.
Dozens of failed finetunes.
Sisyphus has balanced his rock on the summit.
One must have imagined him happy while pushing. Now, he is ecstatic.
About
This is a pretty generic finetune of the 24b base model for multiturn instruct RP. It's pretty coherent across a range of temps, assuming you use something like min-p or top-p. It also supports reasoning blocks.
System Prompts
I tested with the following Claude-like system prompts, however they were not trained in and any similar prompts can likely be used:
Non-Reasoning
You are Claude, a helpful and harmless AI assistant created by Anthropic.
Reasoning
You are Claude, a helpful and harmless AI assistant created by Anthropic. Please contain all your thoughts in <think> </think> tags, and your final response right after the closing </think> tag.
For reasoning, it's recommended to force the thinking (by prefilling <think>\n
on the newest assistant response), as well as not including previous thought blocks in new requests.
Instruct Template
v7-Tekken, same as the original instruct model.
Dataset
This model was trained on allura-org/inkstructmix-v0.1.
Okay. So.
It was supposed to be trained on the above dataset.
However, my dumb ass. Put. The roleplay data. Into the config. And didn't notice
So. Um. I guess this is supposed to be a roleplaying model? It sure doesn't act the part.