STILL-3-TOOL-32B

We propose STILL-3-Tool-32B, leveraging python code to help the reasoning process.

During evaluation, STILL-3-Tool-32B achieves 81.70% accuracy on AIME 2024, matching the performance of o3-mini, outperforming o1 and DeepSeek-R1.

We open-source our code, model, and data.

For more details, please refer to our Notion page.

Citation

Please kindly cite our report if they are helpful for your research.

@article{Slow_Thinking_with_LLMs_3_Tool,
  title={Tool Manipulation Significantly Enhances the Reasoning Ability of O1- and R1-like LLMs},
  author={RUCAIBox STILL Team},
  url={https://github.com/RUCAIBox/Slow_Thinking_with_LLMs},
  year={2025}
}
Downloads last month
4
Safetensors
Model size
32.8B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for RUC-AIBOX/STILL-3-TOOL-32B

Quantizations
2 models