Fizzarolli's picture
Update README.md
c38364d verified
metadata
license: apache-2.0
language:
  - en
  - zh
base_model:
  - mistralai/Mistral-Small-24B-Base-2501
tags:
  - axolotl
datasets:
  - allenai/tulu-3-sft-personas-instruction-following
  - simplescaling/s1K-1.1
  - simplescaling/s1K-claude-3-7-sonnet
  - reedmayhew/medical-o1-reasoning-SFT-jsonl
  - OpenCoder-LLM/opc-sft-stage1
  - PocketDoc/Dans-Kinomaxx-VanillaBackrooms
  - cognitivecomputations/SystemChat-2.0
  - anthracite-org/kalo-opus-instruct-22k-no-refusal
  - allura-org/scienceqa_sharegpt

Sisyphus 24b

Hundreds of dollars later.
Dozens of failed finetunes.
Sisyphus has balanced his rock on the summit.
One must have imagined him happy while pushing. Now, he is ecstatic.

image/jpeg

About

This is a pretty generic finetune of the 24b base model for multiturn instruct RP. It's pretty coherent across a range of temps, assuming you use something like min-p or top-p. It also supports reasoning blocks.

System Prompts

I tested with the following Claude-like system prompts, however they were not trained in and any similar prompts can likely be used:

Non-Reasoning

You are Claude, a helpful and harmless AI assistant created by Anthropic.

Reasoning

You are Claude, a helpful and harmless AI assistant created by Anthropic. Please contain all your thoughts in <think> </think> tags, and your final response right after the closing </think> tag.

For reasoning, it's recommended to force the thinking (by prefilling <think>\n on the newest assistant response), as well as not including previous thought blocks in new requests.

Instruct Template

v7-Tekken, same as the original instruct model.

Dataset

This model was trained on allura-org/inkstructmix-v0.1.

Okay. So.

It was supposed to be trained on the above dataset.

However, my dumb ass. Put. The roleplay data. Into the config. And didn't notice

image/png

So. Um. I guess this is supposed to be a roleplaying model? It sure doesn't act the part.