68.4 on Aider Polyglot with reasoning_effort: high
#122
by
Fernanda24
- opened
more details here: https://www.reddit.com/r/LocalLLaMA/comments/1mnxwmw/unsloth_fixes_chat_template_again_gptoss120high/
pull request to update aider leader-board here: https://github.com/Aider-AI/aider/pull/4444
subsequent runs that support the new score and more discussion here in aider discord: https://discord.gg/Y7X7bhMQFV
*for 120b model
Can you also test 20B ?
test by @tan in aider discord server
for whole edit format with smaller 20b model:
test_cases: 225
model: openai/gpt-oss-20b
edit_format: whole
commit_hash: 32faf82
reasoning_effort: high
pass_rate_1: 18.2
pass_rate_2: 55.6
pass_num_1: 41
pass_num_2: 125
percent_cases_well_formed: 99.6
error_outputs: 1
num_malformed_responses: 1
num_with_malformed_responses: 1
user_asks: 121
lazy_comments: 0
syntax_errors: 0
indentation_errors: 0
exhausted_context_windows: 0
prompt_tokens: 2124190
completion_tokens: 6713723
test_timeouts: 0
total_tests: 225
and for diff edit format also smaller 20b model:
model: openai/gpt-oss-20b
edit_format: diff
commit_hash: da45632-dirty
reasoning_effort: high
pass_rate_1: 13.3
pass_rate_2: 45.3
pass_num_1: 30
pass_num_2: 102
percent_cases_well_formed: 77.3
error_outputs: 199
num_malformed_responses: 199
num_with_malformed_responses: 51
user_asks: 115
lazy_comments: 0
syntax_errors: 0
indentation_errors: 0
exhausted_context_windows: 0
prompt_tokens: 3785993
completion_tokens: 7482667
test_timeouts: 1
total_tests: 225
command: aider --model openai/gpt-oss-20b
date: 2025-08-12
versions: 0.86.1.dev
seconds_per_case: 589.3
total_cost: 0.0000
Language | PR1 | PR2 | PN1 | PN2 | Cases | Asks | Malformed
-----------|-------|-------|-------|-------|-------|-------|-----------
python | 11.8 | 67.6 | 4 | 23 | 34 | 0 | 4
javascript | 8.2 | 55.1 | 4 | 27 | 49 | 1 | 2
java | 10.6 | 36.2 | 5 | 17 | 47 | 6 | 8
cpp | 3.8 | 46.2 | 1 | 12 | 26 | 105 | 11
go | 12.8 | 20.5 | 5 | 8 | 39 | 3 | 166
rust | 36.7 | 50.0 | 11 | 15 | 30 | 0 | 8