arxiv:2506.16406

Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights

Published on Jun 19

· Submitted by

VictorKai1996NUS on Jun 23

#1 Paper of the day

Upvote

102

Authors:

Zhiyuan Liang ,

Xuanlei Zhao ,

Mingjia Shi ,

Kai Wang

Abstract

Drag-and-Drop LLMs generate task-specific parameters through prompt-conditioned parameter generation, achieving significant efficiency gains and cross-domain generalization without per-task training.

AI-generated summary

Modern Parameter-Efficient Fine-Tuning (PEFT) methods such as low-rank adaptation (LoRA) reduce the cost of customizing large language models (LLMs), yet still require a separate optimization run for every downstream dataset. We introduce Drag-and-Drop LLMs (\textit{DnD)}, a prompt-conditioned parameter generator that eliminates per-task training by mapping a handful of unlabeled task prompts directly to LoRA weight updates. A lightweight text encoder distills each prompt batch into condition embeddings, which are then transformed by a cascaded hyper-convolutional decoder into the full set of LoRA matrices. Once trained in a diverse collection of prompt-checkpoint pairs, DnD produces task-specific parameters in seconds, yielding i) up to 12,000times lower overhead than full fine-tuning, ii) average gains up to 30\% in performance over the strongest training LoRAs on unseen common-sense reasoning, math, coding, and multimodal benchmarks, and iii) robust cross-domain generalization despite never seeing the target data or labels. Our results demonstrate that prompt-conditioned parameter generation is a viable alternative to gradient-based adaptation for rapidly specializing LLMs. Our project is available at https://jerryliang24.github.io/DnD{https://jerryliang24.github.io/DnD}.

View arXiv page View PDF Project page Add to collection

Community

VictorKai1996NUS

Paper author Paper submitter 3 days ago

Modern Parameter-Efficient Fine-Tuning (PEFT) methods such as low-rank adaptation (LoRA) reduce the cost of customizing large language models (LLMs), yet still require a separate optimization run for every downstream dataset. We introduce Drag-and-Drop LLMs (DnD), a prompt-conditioned parameter generator that eliminates per-task training by mapping a handful of unlabeled task prompts directly to LoRA weight updates. A lightweight text encoder distills each prompt batch into condition embeddings, which are then transformed by a cascaded hyperconvolutional decoder into the full set of LoRA matrices. Once trained in a diverse collection of prompt-checkpoint pairs, DnD produces task-specific parameters in
seconds, yielding i) up to 12,000× lower overhead than full fine-tuning, ii) average gains up to 30% in performance over the strongest training LoRAs on unseen common-sense reasoning, math, coding, and multimodal benchmarks, and iii) robust cross-domain generalization despite never seeing the target data or labels. Our results demonstrate that prompt-conditioned parameter generation is a viable alternative to gradient-based adaptation for rapidly specializing LLMs. Our project is available at https://jerryliang24.github.io/DnD.

pandases

3 days ago

I don't understand why the performance is better than LoRA because "DnD" model is also trained from multi LoRA.

Jerrylz

Paper author 3 days ago

Thanks for viewing our paper! DnD is not actually learning how to construct a LoRA adapter that closely resembles those in training, but manage to establish the connection between input data and trained parameters. The prompt-checkpoint pairs in training data provide it with comprehensive knowledge about this mapping. Consequently, DnD can generate parameters for zero-shot test sets given prompts as inspirations, outperforming its training LoRAs who didn't encounter zero-shot test sets in their training.

Kutches

3 days ago

Ok this interesting

Jerrylz

Paper author 3 days ago

Thanks for approving our work!

gzzyyxy

3 days ago

where to find so many users to upvote? curious...

Jerrylz

Paper author 3 days ago

Maybe many people like the idea of customizing LLMs in seconds?

grantsing

3 days ago

full audio breakdown here 👉 https://arxivexplained.com/papers/drag-and-drop-llms-zero-shot-prompt-to-weights

Jerrylz

Paper author 3 days ago

Thanks for your promoting our paper!

librarian-bot

3 days ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

abracadabra2

2 days ago

•

edited about 11 hours ago

In essence, you are training another model to predict a new LoRA given a few prompts. Can the training of the hyper-convolutional decoder be considered a SFT method?

Jerrylz

Paper author 1 day ago

Maybe this is indeed a SFT manner: given a prompt, given "ground truth" weights and learn. We plan to explore a RL-like manner in the future: tell the generator what are the "good weights" and "bad" weights, to excavate parameter generation's further potential, so stay tuned!

abracadabra2

2 days ago

This comment has been hidden (marked as Resolved)

vkataev

2 days ago

The day when models generate other models based on prompts is coming:

Prompt: "Produce a multilingual text-generating model of size N for quick inference on mobile devices, domain: smart mobile assistant"
Output: new model
:)
Thanks for your interesting work!

Jerrylz

Paper author 2 days ago

Thanks for your valuable insights! Actually, that's a very interesting hypothesis that requires extensive exploration and contribution. We are really interested in this illustration and hope to achieve this in the future!

jackyoung96

1 day ago

Funny idea and great work👍!! I have a question btw.
Why the performance of foundation model is better than LoRA training?

Jerrylz

Paper author about 14 hours ago

Because training LoRAs didn't see the test data, so the testing process is completely zero-shot. Training on other datasets may enable the model to overfit on the training set, lowering its zero-shot performance than foundation models. We have detailed discussion in Section 3.4, 3.5, hope this can help illustrate better.