Papers
arxiv:2506.16406

Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights

Published on Jun 19
· Submitted by VictorKai1996NUS on Jun 23
#1 Paper of the day
Authors:
,
,
,
,
,
,
,
,
,
,

Abstract

Drag-and-Drop LLMs generate task-specific parameters through prompt-conditioned parameter generation, achieving significant efficiency gains and cross-domain generalization without per-task training.

AI-generated summary

Modern Parameter-Efficient Fine-Tuning (PEFT) methods such as low-rank adaptation (LoRA) reduce the cost of customizing large language models (LLMs), yet still require a separate optimization run for every downstream dataset. We introduce Drag-and-Drop LLMs (\textit{DnD)}, a prompt-conditioned parameter generator that eliminates per-task training by mapping a handful of unlabeled task prompts directly to LoRA weight updates. A lightweight text encoder distills each prompt batch into condition embeddings, which are then transformed by a cascaded hyper-convolutional decoder into the full set of LoRA matrices. Once trained in a diverse collection of prompt-checkpoint pairs, DnD produces task-specific parameters in seconds, yielding i) up to 12,000times lower overhead than full fine-tuning, ii) average gains up to 30\% in performance over the strongest training LoRAs on unseen common-sense reasoning, math, coding, and multimodal benchmarks, and iii) robust cross-domain generalization despite never seeing the target data or labels. Our results demonstrate that prompt-conditioned parameter generation is a viable alternative to gradient-based adaptation for rapidly specializing LLMs. Our project is available at https://jerryliang24.github.io/DnD{https://jerryliang24.github.io/DnD}.

Community

Paper author Paper submitter

Modern Parameter-Efficient Fine-Tuning (PEFT) methods such as low-rank adaptation (LoRA) reduce the cost of customizing large language models (LLMs), yet still require a separate optimization run for every downstream dataset. We introduce Drag-and-Drop LLMs (DnD), a prompt-conditioned parameter generator that eliminates per-task training by mapping a handful of unlabeled task prompts directly to LoRA weight updates. A lightweight text encoder distills each prompt batch into condition embeddings, which are then transformed by a cascaded hyperconvolutional decoder into the full set of LoRA matrices. Once trained in a diverse collection of prompt-checkpoint pairs, DnD produces task-specific parameters in
seconds, yielding i) up to 12,000× lower overhead than full fine-tuning, ii) average gains up to 30% in performance over the strongest training LoRAs on unseen common-sense reasoning, math, coding, and multimodal benchmarks, and iii) robust cross-domain generalization despite never seeing the target data or labels. Our results demonstrate that prompt-conditioned parameter generation is a viable alternative to gradient-based adaptation for rapidly specializing LLMs. Our project is available at https://jerryliang24.github.io/DnD.

I don't understand why the performance is better than LoRA because "DnD" model is also trained from multi LoRA.

·
Paper author

Thanks for viewing our paper! DnD is not actually learning how to construct a LoRA adapter that closely resembles those in training, but manage to establish the connection between input data and trained parameters. The prompt-checkpoint pairs in training data provide it with comprehensive knowledge about this mapping. Consequently, DnD can generate parameters for zero-shot test sets given prompts as inspirations, outperforming its training LoRAs who didn't encounter zero-shot test sets in their training.

Ok this interesting

·
Paper author

Thanks for approving our work!

where to find so many users to upvote? curious...

·
Paper author

Maybe many people like the idea of customizing LLMs in seconds?

·
Paper author

Thanks for your promoting our paper!

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

In essence, you are training another model to predict a new LoRA given a few prompts. Can the training of the hyper-convolutional decoder be considered a SFT method?

·
Paper author

Maybe this is indeed a SFT manner: given a prompt, given "ground truth" weights and learn. We plan to explore a RL-like manner in the future: tell the generator what are the "good weights" and "bad" weights, to excavate parameter generation's further potential, so stay tuned!

This comment has been hidden (marked as Resolved)

The day when models generate other models based on prompts is coming:

Prompt: "Produce a multilingual text-generating model of size N for quick inference on mobile devices, domain: smart mobile assistant"
Output: new model
:)
Thanks for your interesting work!

·
Paper author

Thanks for your valuable insights! Actually, that's a very interesting hypothesis that requires extensive exploration and contribution. We are really interested in this illustration and hope to achieve this in the future!

Funny idea and great work👍!! I have a question btw.
Why the performance of foundation model is better than LoRA training?

·

Because training LoRAs didn't see the test data, so the testing process is completely zero-shot. Training on other datasets may enable the model to overfit on the training set, lowering its zero-shot performance than foundation models. We have detailed discussion in Section 3.4, 3.5, hope this can help illustrate better.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2506.16406 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2506.16406 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2506.16406 in a Space README.md to link it from this page.

Collections including this paper 8