dongguanting nielsr HF Staff commited on
Commit
2350a1d
·
verified ·
1 Parent(s): f597f82

Improve model card: Correct pipeline tag, add library name, license (#2)

Browse files

- Improve model card: Correct pipeline tag, add library name, license (6d5977c2b9e5d19bfa8b70783bbc4c2a165183e9)


Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show
  1. README.md +39 -3
README.md CHANGED
@@ -1,9 +1,15 @@
 
 
 
 
 
 
1
  ---
2
  frameworks:
3
  - Pytorch
4
- license: apache-2.0
5
  tasks:
6
- - text-to-image-synthesis
7
  language:
8
  - en
9
  metrics:
@@ -18,4 +24,34 @@ This is the official checkpoint we trained using the tool-star framework, based
18
 
19
  Huggingface Paper: https://huggingface.co/papers/2505.16410
20
 
21
- Details please refer to https://github.com/dongguanting/Tool-Star
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: text-generation
4
+ library_name: transformers
5
+ ---
6
+
7
  ---
8
  frameworks:
9
  - Pytorch
10
+ license: mit
11
  tasks:
12
+ - text-generation
13
  language:
14
  - en
15
  metrics:
 
24
 
25
  Huggingface Paper: https://huggingface.co/papers/2505.16410
26
 
27
+ Details please refer to https://github.com/dongguanting/Tool-Star
28
+
29
+ # Paper title and link
30
+
31
+ The model was presented in the paper [Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement
32
+ Learning](https://huggingface.co/papers/2505.16410).
33
+
34
+ # Paper abstract
35
+
36
+ The abstract of the paper is the following:
37
+
38
+ Recently, large language models (LLMs) have shown remarkable reasoning
39
+ capabilities via large-scale reinforcement learning (RL). However, leveraging
40
+ the RL algorithm to empower effective multi-tool collaborative reasoning in
41
+ LLMs remains an open challenge. In this paper, we introduce Tool-Star, an
42
+ RL-based framework designed to empower LLMs to autonomously invoke multiple
43
+ external tools during stepwise reasoning. Tool-Star integrates six types of
44
+ tools and incorporates systematic designs in both data synthesis and training.
45
+ To address the scarcity of tool-use data, we propose a general tool-integrated
46
+ reasoning data synthesis pipeline, which combines tool-integrated prompting
47
+ with hint-based sampling to automatically and scalably generate tool-use
48
+ trajectories. A subsequent quality normalization and difficulty-aware
49
+ classification process filters out low-quality samples and organizes the
50
+ dataset from easy to hard. Furthermore, we propose a two-stage training
51
+ framework to enhance multi-tool collaborative reasoning by: (1) cold-start
52
+ fine-tuning, which guides LLMs to explore reasoning patterns via
53
+ tool-invocation feedback; and (2) a multi-tool self-critic RL algorithm with
54
+ hierarchical reward design, which reinforces reward understanding and promotes
55
+ effective tool collaboration. Experimental analyses on over 10 challenging
56
+ reasoning benchmarks highlight the effectiveness and efficiency of Tool-Star.
57
+ The code is available at https://github.com/dongguanting/Tool-Star.