Improve model card: Correct pipeline tag, add library name, license (#2)

Browse files

- Improve model card: Correct pipeline tag, add library name, license (6d5977c2b9e5d19bfa8b70783bbc4c2a165183e9)

Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show

README.md +39 -3

README.md CHANGED Viewed

@@ -1,9 +1,15 @@
 ---
 frameworks:
 - Pytorch
-license: apache-2.0
 tasks:
-- text-to-image-synthesis
 language:
 - en
 metrics:
@@ -18,4 +24,34 @@ This is the official checkpoint we trained using the tool-star framework, based
 Huggingface Paper: https://huggingface.co/papers/2505.16410
-Details please refer to https://github.com/dongguanting/Tool-Star

+---
+license: mit
+pipeline_tag: text-generation
+library_name: transformers
+---
 ---
 frameworks:
 - Pytorch
+license: mit
 tasks:
+- text-generation
 language:
 - en
 metrics:
 Huggingface Paper: https://huggingface.co/papers/2505.16410
+Details please refer to https://github.com/dongguanting/Tool-Star
+# Paper title and link
+The model was presented in the paper [Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement
+  Learning](https://huggingface.co/papers/2505.16410).
+# Paper abstract
+The abstract of the paper is the following:
+Recently, large language models (LLMs) have shown remarkable reasoning
+capabilities via large-scale reinforcement learning (RL). However, leveraging
+the RL algorithm to empower effective multi-tool collaborative reasoning in
+LLMs remains an open challenge. In this paper, we introduce Tool-Star, an
+RL-based framework designed to empower LLMs to autonomously invoke multiple
+external tools during stepwise reasoning. Tool-Star integrates six types of
+tools and incorporates systematic designs in both data synthesis and training.
+To address the scarcity of tool-use data, we propose a general tool-integrated
+reasoning data synthesis pipeline, which combines tool-integrated prompting
+with hint-based sampling to automatically and scalably generate tool-use
+trajectories. A subsequent quality normalization and difficulty-aware
+classification process filters out low-quality samples and organizes the
+dataset from easy to hard. Furthermore, we propose a two-stage training
+framework to enhance multi-tool collaborative reasoning by: (1) cold-start
+fine-tuning, which guides LLMs to explore reasoning patterns via
+tool-invocation feedback; and (2) a multi-tool self-critic RL algorithm with
+hierarchical reward design, which reinforces reward understanding and promotes
+effective tool collaboration. Experimental analyses on over 10 challenging
+reasoning benchmarks highlight the effectiveness and efficiency of Tool-Star.
+The code is available at https://github.com/dongguanting/Tool-Star.