Abstract
Drag-and-Drop LLMs generate task-specific parameters through prompt-conditioned parameter generation, achieving significant efficiency gains and cross-domain generalization without per-task training.
Modern Parameter-Efficient Fine-Tuning (PEFT) methods such as low-rank adaptation (LoRA) reduce the cost of customizing large language models (LLMs), yet still require a separate optimization run for every downstream dataset. We introduce Drag-and-Drop LLMs (\textit{DnD)}, a prompt-conditioned parameter generator that eliminates per-task training by mapping a handful of unlabeled task prompts directly to LoRA weight updates. A lightweight text encoder distills each prompt batch into condition embeddings, which are then transformed by a cascaded hyper-convolutional decoder into the full set of LoRA matrices. Once trained in a diverse collection of prompt-checkpoint pairs, DnD produces task-specific parameters in seconds, yielding i) up to 12,000times lower overhead than full fine-tuning, ii) average gains up to 30\% in performance over the strongest training LoRAs on unseen common-sense reasoning, math, coding, and multimodal benchmarks, and iii) robust cross-domain generalization despite never seeing the target data or labels. Our results demonstrate that prompt-conditioned parameter generation is a viable alternative to gradient-based adaptation for rapidly specializing LLMs. Our project is available at https://jerryliang24.github.io/DnD{https://jerryliang24.github.io/DnD}.
Community
Modern Parameter-Efficient Fine-Tuning (PEFT) methods such as low-rank adaptation (LoRA) reduce the cost of customizing large language models (LLMs), yet still require a separate optimization run for every downstream dataset. We introduce Drag-and-Drop LLMs (DnD), a prompt-conditioned parameter generator that eliminates per-task training by mapping a handful of unlabeled task prompts directly to LoRA weight updates. A lightweight text encoder distills each prompt batch into condition embeddings, which are then transformed by a cascaded hyperconvolutional decoder into the full set of LoRA matrices. Once trained in a diverse collection of prompt-checkpoint pairs, DnD produces task-specific parameters in
seconds, yielding i) up to 12,000× lower overhead than full fine-tuning, ii) average gains up to 30% in performance over the strongest training LoRAs on unseen common-sense reasoning, math, coding, and multimodal benchmarks, and iii) robust cross-domain generalization despite never seeing the target data or labels. Our results demonstrate that prompt-conditioned parameter generation is a viable alternative to gradient-based adaptation for rapidly specializing LLMs. Our project is available at https://jerryliang24.github.io/DnD.
I don't understand why the performance is better than LoRA because "DnD" model is also trained from multi LoRA.
Thanks for viewing our paper! DnD is not actually learning how to construct a LoRA adapter that closely resembles those in training, but manage to establish the connection between input data and trained parameters. The prompt-checkpoint pairs in training data provide it with comprehensive knowledge about this mapping. Consequently, DnD can generate parameters for zero-shot test sets given prompts as inspirations, outperforming its training LoRAs who didn't encounter zero-shot test sets in their training.
full audio breakdown here 👉 https://arxivexplained.com/papers/drag-and-drop-llms-zero-shot-prompt-to-weights
Thanks for your promoting our paper!
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Text-to-LoRA: Instant Transformer Adaption (2025)
- You Only Fine-tune Once: Many-Shot In-Context Fine-Tuning for Large Language Model (2025)
- DenseLoRA: Dense Low-Rank Adaptation of Large Language Models (2025)
- HD-PiSSA: High-Rank Distributed Orthogonal Adaptation (2025)
- MAP: Revisiting Weight Decomposition for Low-Rank Adaptation (2025)
- Leveraging Self-Attention for Input-Dependent Soft Prompting in LLMs (2025)
- DiaBlo: Diagonal Blocks Are Sufficient For Finetuning (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
In essence, you are training another model to predict a new LoRA given a few prompts. Can the training of the hyper-convolutional decoder be considered a SFT method?
Maybe this is indeed a SFT manner: given a prompt, given "ground truth" weights and learn. We plan to explore a RL-like manner in the future: tell the generator what are the "good weights" and "bad" weights, to excavate parameter generation's further potential, so stay tuned!
The day when models generate other models based on prompts is coming:
Prompt: "Produce a multilingual text-generating model of size N for quick inference on mobile devices, domain: smart mobile assistant"
Output: new model
:)
Thanks for your interesting work!
Thanks for your valuable insights! Actually, that's a very interesting hypothesis that requires extensive exploration and contribution. We are really interested in this illustration and hope to achieve this in the future!
Funny idea and great work👍!! I have a question btw.
Why the performance of foundation model is better than LoRA training?
Because training LoRAs didn't see the test data, so the testing process is completely zero-shot. Training on other datasets may enable the model to overfit on the training set, lowering its zero-shot performance than foundation models. We have detailed discussion in Section 3.4, 3.5, hope this can help illustrate better.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper