AIME24, AIME25 and GPQA-Diamond results

#5
by ID0M - opened

Hi,
I see you mentioned those were tested but I don't see individual scores on these benchmarks for R1T2..

TNG Technology Consulting GmbH org

You are right. We've averaged the individual scores to compose the intelligence score, which is plotted. We'll consider updating the model card to include the individual scores.

TNG Technology Consulting GmbH org

We updated the model card.

rbrt changed discussion status to closed

Sign up or log in to comment