Papers
arxiv:2506.12015

EMLoC: Emulator-based Memory-efficient Fine-tuning with LoRA Correction

Published on Jun 13
· Submitted by hsichelin on Jun 18
Authors:
,
,
,

Abstract

EMLoC, an memory-efficient fine-tuning framework using activation-aware SVD and LoRA, allows model adaptation within inference memory constraints for diverse applications.

AI-generated summary

Open-source foundation models have seen rapid adoption and development, enabling powerful general-purpose capabilities across diverse domains. However, fine-tuning large foundation models for domain-specific or personalized tasks remains prohibitively expensive for most users due to the significant memory overhead beyond that of inference. We introduce EMLoC, an Emulator-based Memory-efficient fine-tuning framework with LoRA Correction, which enables model fine-tuning within the same memory budget required for inference. EMLoC constructs a task-specific light-weight emulator using activation-aware singular value decomposition (SVD) on a small downstream calibration set. Fine-tuning then is performed on this lightweight emulator via LoRA. To tackle the misalignment between the original model and the compressed emulator, we propose a novel compensation algorithm to correct the fine-tuned LoRA module, which thus can be merged into the original model for inference. EMLoC supports flexible compression ratios and standard training pipelines, making it adaptable to a wide range of applications. Extensive experiments demonstrate that EMLoC outperforms other baselines across multiple datasets and modalities. Moreover, without quantization, EMLoC enables fine-tuning of a 38B model on a single 24GB consumer GPU-bringing efficient and practical model adaptation to individual users.

Community

Paper submitter

EMLoC is a memory-efficient fine-tuning framework that enables model fine-tuning using the same memory budget as inference.

  • Unlock fine-tuning benefits for all users who can run inference.
  • Seamlessly integrates into standard fine-tuning workflows.
  • Fine-tune large models like InternVL 38B and FLUX.1-dev (12B) on a single 24GB GPU, without quantization.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2506.12015 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2506.12015 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2506.12015 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.