arxiv:2507.03336

Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs More Realistic and Less Risky

Published on Jul 4

· Submitted by

ashutosh1919 on Jul 8

Upvote

Authors:

Ashutosh Hathidara ,

Abstract

DiaFORGE is a disambiguation framework that enhances large language models' ability to invoke enterprise APIs accurately through dialogue synthesis, supervised fine-tuning, and real-world evaluation.

AI-generated summary

Large language models (LLMs) are increasingly tasked with invoking enterprise APIs, yet they routinely falter when near-duplicate tools vie for the same user intent or when required arguments are left underspecified. We introduce DiaFORGE (Dialogue Framework for Organic Response Generation & Evaluation), a disambiguation-centric, three-stage pipeline that (i) synthesizes persona-driven, multi-turn dialogues in which the assistant must distinguish among highly similar tools, (ii) performs supervised fine-tuning of open-source models with reasoning traces across 3B - 70B parameters, and (iii) evaluates real-world readiness via a dynamic suite that redeploys each model in a live agentic loop and reports end-to-end goal completion alongside conventional static metrics. On our dynamic benchmark DiaBENCH, models trained with DiaFORGE raise tool-invocation success by 27 pp over GPT-4o and by 49 pp over Claude-3.5-Sonnet, both under optimized prompting. To spur further research, we release an open corpus of 5000 production-grade enterprise API specifications paired with rigorously validated, disambiguation-focused dialogues, offering a practical blueprint for building reliable, enterprise-ready tool-calling agents.

View arXiv page View PDF Add to collection

Community

ashutosh1919

Paper author Paper submitter 1 day ago

We are thrilled to share our recent work on Enterprise Tool-calling LLMs titled "Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs More Realistic and Less Risky".

Paper: https://arxiv.org/abs/2507.03336
Data: https://huggingface.co/datasets/sap-ai-research/diaforge-utc-r-0725

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2507.03336 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2507.03336 in a Space README.md to link it from this page.