AIME24, AIME25 and GPQA-Diamond results
#5
by
ID0M
- opened
Hi,
I see you mentioned those were tested but I don't see individual scores on these benchmarks for R1T2..
You are right. We've averaged the individual scores to compose the intelligence score, which is plotted. We'll consider updating the model card to include the individual scores.
We updated the model card.
rbrt
changed discussion status to
closed