A much further trained version, this time done with full finetuning instead of DoRA. Similar ~50/50 mix of completion and instruct data.

Note: This likely has refusals like PJMixers-Dev/LLaMa-3.2-Instruct-JankMix-v0.1-SFT-3B since no focus was put on removing refusals. I'm working on a KTO DoRA to solve this, and possibly improve roleplay performance.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	21.50
IFEval (0-Shot)	62.92
BBH (3-Shot)	23.34
MATH Lvl 5 (4-Shot)	11.33
GPQA (0-shot)	3.02
MuSR (0-shot)	4.87
MMLU-PRO (5-shot)	23.50

Downloads last month: 8

Safetensors

Model size

3.21B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for PJMixers-Dev/LLaMa-3.2-Instruct-JankMix-v0.2-SFT-3B

Base model

meta-llama/Llama-3.2-3B-Instruct

Finetuned

unsloth/Llama-3.2-3B-Instruct

Finetuned

(342)

this model

Finetunes

1 model

Quantizations

2 models

Evaluation results

strict accuracy on IFEval (0-Shot)
Open LLM Leaderboard

62.920
normalized accuracy on BBH (3-Shot)
Open LLM Leaderboard

23.340
exact match on MATH Lvl 5 (4-Shot)
Open LLM Leaderboard

11.330
acc_norm on GPQA (0-shot)
Open LLM Leaderboard

3.020
acc_norm on MuSR (0-shot)
Open LLM Leaderboard

4.870
accuracy on MMLU-PRO (5-shot)
test set Open LLM Leaderboard

23.500

View on Papers With Code