DiaTool-DPO: Multi-Turn Direct Preference Optimization for Tool-Augmented Large Language Models
Abstract
Tool-Augmented Larage Language Models (TA-LLMs) have shown promise in real-world applications, but face challenges in handling incomplete queries and out-of-scope requests. While existing approaches rely mainly on Supervised Fine-Tuning with expert trajectories, we propose DiaTool-DPO, a novel method that enhances TA-LLM's dialogue capabilities through Direct Preference Optimization. We model TA-LLM interactions as a Markov Decision Process with 5 distinct dialogue states and categorize user queries into 3 types based on their state transition trajectories. We automatically construct paired trajectory datasets of correct and incorrect dialogue flows and introduce a specialized objective loss for dialogue control. Our comprehensive evaluation demonstrates that DiaTool-DPO approaches GPT-4o's performance (94.8% in information gathering, 91% in tool call rejection) with substantial improvements over baseline (44% and 9.6% respectively) while maintaining core functionality. Our approach opens new possibilities for developing TA-LLMs that can handle diverse real-world scenarios without requiring additional expert demonstrations or human labeling.
Community
This paper suggests applying multi-turn DPO to tool-augmneted LLMs.
Recent works on tool-augmented LLMs mostly focus on the generation process of training dataset or benchmarks and there are not much works on the training objective.
This paper suggests
- Automatically generating DPO dataset for tool-augmented LLM.
- A loss function specified to train multi-turn DPO for tool-augmented LLM
to enhance the capability to ask clarifying questions in the dialog when the information from user is insufficient to invoke a tool call.
I hope such direction of work on tool-augmented LLM also become flurishing, starting from this humble work.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Can a Single Model Master Both Multi-turn Conversations and Tool Use? CoALM: A Unified Conversational Agentic Language Model (2025)
- ToolDial: Multi-turn Dialogue Generation Method for Tool-Augmented Language Models (2025)
- Magnet: Multi-turn Tool-use Data Synthesis and Distillation via Graph Translation (2025)
- MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation (2025)
- Self-Training Large Language Models for Tool-Use Without Demonstrations (2025)
- In Prospect and Retrospect: Reflective Memory Management for Long-term Personalized Dialogue Agents (2025)
- MMCR: Advancing Visual Language Model in Multimodal Multi-Turn Contextual Reasoning (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper