Papers
arxiv:2506.08060

Eliciting Fine-Tuned Transformer Capabilities via Inference-Time Techniques

Published on Jun 9

Abstract

Transformers can approximate supervised fine-tuning capabilities through in-context learning without altering model parameters, supported by theoretical bounds and practical techniques.

AI-generated summary

Large language models have transformed natural language processing, yet supervised fine-tuning (SFT) remains computationally intensive. This paper formally proves that capabilities acquired through SFT can be approximated by a base transformer model using inference-time techniques, specifically in-context learning (ICL), without altering model parameters, under idealized assumptions including unbounded computational resources and access to the fine-tuning dataset. We extend these results to practical scenarios with finite context lengths and partial dataset access. For text generation tasks with fixed output length l, datasets of size Oleft( m V{varepsilon^2} log m{delta} right) or, with bounded context, Oleft( l log V{varepsilon^2} log 1{delta} right) suffice to approximate fine-tuned behavior across m contexts within error varepsilon, where V is the vocabulary size and delta is the failure probability. For linear classification, datasets of size Oleft( d{varepsilon} right) or, with fixed context, Oleft( 1{varepsilon^2} log 1{delta} right) are sufficient, where d is the input dimension. Grounded in the Turing completeness of transformers, these results provide a theoretical foundation for resource-efficient deployment of large language models, with practical techniques like retrieval-augmented generation bridging theory to real-world applications.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2506.08060 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2506.08060 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2506.08060 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.