Quantized Fine-Tunning
Has anyone tried to fine-tune the quantized model?
I am getting a RuntimeError:
/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py in requires_grad_(self, requires_grad)
2882 """
2883 for p in self.parameters():
-> 2884 p.requires_grad_(requires_grad)
2885 return self
2886
RuntimeError: only Tensors of floating point dtype can require gradients
I am updating the sample_finetune_vision with a basic quantization like:
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.bfloat16,
)
model = AutoModelForCausalLM.from_pretrained(
model_name_or_path,
# device_map="cuda",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
# if you do not use Ampere or later GPUs, change attention to "eager"
_attn_implementation='eager',
quantization_config=quantization_config,
)
//Delete Audio layers
model = prepare_model_for_kbit_training(model, use_gradient_checkpointing=False, gradient_checkpointing_kwargs={'use_reentrant':False})
config = LoraConfig(
task_type="CAUSAL_LM",
r=16,
lora_alpha=32,
lora_dropout=0.05,
bias="none",
inference_mode=False,
target_modules=["out_proj",], # Example: target the attention layers "q_proj", "k_proj", "v_proj"
)
urgent
if you find any solution please let me know