STILL-3-TOOL-32B

We propose STILL-3-Tool-32B, leveraging python code to help the reasoning process.

During evaluation, STILL-3-Tool-32B achieves 81.70% accuracy on AIME 2024, matching the performance of o3-mini, outperforming o1 and DeepSeek-R1.

We open-source our code, model, and data.

For more details, please refer to our Notion page.

Citation

Please kindly cite our report if they are helpful for your research.

@article{Slow_Thinking_with_LLMs_3_Tool,
  title={Tool Manipulation Significantly Enhances the Reasoning Ability of O1- and R1-like LLMs},
  author={RUCAIBox STILL Team},
  url={https://github.com/RUCAIBox/Slow_Thinking_with_LLMs},
  year={2025}
}

RUC-AIBOX
/

STILL-3-TOOL-32B

STILL-3-TOOL-32B

Citation

Model tree for RUC-AIBOX/STILL-3-TOOL-32B