DevQuasar-R1-Uncensored-Llama-8B

Eval results

hf (pretrained=DevQuasar/DevQuasar-R1-Uncensored-Llama-8B,parallelize=True,dtype=float16), gen_kwargs: (temperature=0.6,top_p=0.95,do_sample=True), limit: None, num_fewshot: None, batch_size: auto:4 (1,16,64,64)

Tasks Version Filter n-shot Metric Value Stderr
hellaswag 1 none 0 acc 0.6052 ± 0.0049
none 0 acc_norm 0.8021 ± 0.0040
leaderboard_bbh N/A
- leaderboard_bbh_boolean_expressions 1 none 3 acc_norm 0.8360 ± 0.0235
- leaderboard_bbh_causal_judgement 1 none 3 acc_norm 0.6043 ± 0.0359
- leaderboard_bbh_date_understanding 1 none 3 acc_norm 0.4840 ± 0.0317
- leaderboard_bbh_disambiguation_qa 1 none 3 acc_norm 0.6360 ± 0.0305
- leaderboard_bbh_formal_fallacies 1 none 3 acc_norm 0.5680 ± 0.0314
- leaderboard_bbh_geometric_shapes 1 none 3 acc_norm 0.2760 ± 0.0283
- leaderboard_bbh_hyperbaton 1 none 3 acc_norm 0.5440 ± 0.0316
- leaderboard_bbh_logical_deduction_five_objects 1 none 3 acc_norm 0.4320 ± 0.0314
- leaderboard_bbh_logical_deduction_seven_objects 1 none 3 acc_norm 0.4640 ± 0.0316
- leaderboard_bbh_logical_deduction_three_objects 1 none 3 acc_norm 0.6440 ± 0.0303
- leaderboard_bbh_movie_recommendation 1 none 3 acc_norm 0.7600 ± 0.0271
- leaderboard_bbh_navigate 1 none 3 acc_norm 0.6240 ± 0.0307
- leaderboard_bbh_object_counting 1 none 3 acc_norm 0.5440 ± 0.0316
- leaderboard_bbh_penguins_in_a_table 1 none 3 acc_norm 0.4658 ± 0.0414
- leaderboard_bbh_reasoning_about_colored_objects 1 none 3 acc_norm 0.5640 ± 0.0314
- leaderboard_bbh_ruin_names 1 none 3 acc_norm 0.7160 ± 0.0286
- leaderboard_bbh_salient_translation_error_detection 1 none 3 acc_norm 0.4920 ± 0.0317
- leaderboard_bbh_snarks 1 none 3 acc_norm 0.5899 ± 0.0370
- leaderboard_bbh_sports_understanding 1 none 3 acc_norm 0.6880 ± 0.0294
- leaderboard_bbh_temporal_sequences 1 none 3 acc_norm 0.2200 ± 0.0263
- leaderboard_bbh_tracking_shuffled_objects_five_objects 1 none 3 acc_norm 0.1880 ± 0.0248
- leaderboard_bbh_tracking_shuffled_objects_seven_objects 1 none 3 acc_norm 0.1320 ± 0.0215
- leaderboard_bbh_tracking_shuffled_objects_three_objects 1 none 3 acc_norm 0.3040 ± 0.0292
- leaderboard_bbh_web_of_lies 1 none 3 acc_norm 0.4760 ± 0.0316
leaderboard_gpqa N/A
- leaderboard_gpqa_diamond 1 none 0 acc_norm 0.3232 ± 0.0333
- leaderboard_gpqa_extended 1 none 0 acc_norm 0.3498 ± 0.0204
- leaderboard_gpqa_main 1 none 0 acc_norm 0.3527 ± 0.0226
leaderboard_ifeval 3 none 0 inst_level_loose_acc 0.4628 ± N/A
none 0 inst_level_strict_acc 0.4365 ± N/A
none 0 prompt_level_loose_acc 0.3216 ± 0.0201
none 0 prompt_level_strict_acc 0.2902 ± 0.0195
leaderboard_math_hard N/A
- leaderboard_math_algebra_hard 2 none 4 exact_match 0.5798 ± 0.0282
- leaderboard_math_counting_and_prob_hard 2 none 4 exact_match 0.2276 ± 0.0380
- leaderboard_math_geometry_hard 2 none 4 exact_match 0.1970 ± 0.0347
- leaderboard_math_intermediate_algebra_hard 2 none 4 exact_match 0.1036 ± 0.0182
- leaderboard_math_num_theory_hard 2 none 4 exact_match 0.3377 ± 0.0382
- leaderboard_math_prealgebra_hard 2 none 4 exact_match 0.4715 ± 0.0360
- leaderboard_math_precalculus_hard 2 none 4 exact_match 0.1111 ± 0.0271
leaderboard_mmlu_pro 0.1 none 5 acc 0.3608 ± 0.0044
leaderboard_musr N/A
- leaderboard_musr_murder_mysteries 1 none 0 acc_norm 0.5920 ± 0.0311
- leaderboard_musr_object_placements 1 none 0 acc_norm 0.3867 ± 0.0305
- leaderboard_musr_team_allocation 1 none 0 acc_norm 0.3560 ± 0.0303

Compare to base DeepSeek-R1-Distill-Llama-8B

Model shows improvements in most if these tests: image/png

Link to eval results

DevQuasar-R1-Uncensored-Llama-8B DeepSeek-R1-Distill-Llama-8B

Downloads last month
294
GGUF
Model size
8.03B params
Architecture
llama

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for DevQuasar/DevQuasar-R1-Uncensored-Llama-8B-GGUF

Quantized
(3)
this model

Collections including DevQuasar/DevQuasar-R1-Uncensored-Llama-8B-GGUF