STILL-3-TOOL-32B
We propose STILL-3-Tool-32B, leveraging python code to help the reasoning process.
During evaluation, STILL-3-Tool-32B achieves 81.70% accuracy on AIME 2024, matching the performance of o3-mini, outperforming o1 and DeepSeek-R1.
We open-source our code, model, and data.
For more details, please refer to our Notion page.
Citation
Please kindly cite our report if they are helpful for your research.
@article{Slow_Thinking_with_LLMs_3_Tool,
title={Tool Manipulation Significantly Enhances the Reasoning Ability of O1- and R1-like LLMs},
author={RUCAIBox STILL Team},
url={https://github.com/RUCAIBox/Slow_Thinking_with_LLMs},
year={2025}
}
- Downloads last month
- 4
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.