Code for quantization (Generated by Grok with manual editing)
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch
import sys
# Define model ID
model_id = sys.argv[1]
# Configure quantization
quantization_config = BitsAndBytesConfig(
load_in_4bit=True, # Use 4-bit quantization (or load_in_8bit=True for 8-bit)
bnb_4bit_quant_type="nf4", # Normal Float 4-bit (nf4) for better precision
bnb_4bit_compute_dtype=torch.float16, # Compute in float16 for efficiency
bnb_4bit_use_double_quant=True # Double quantization for further memory savings
)
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
# Load quantized model
model = AutoModelForCausalLM.from_pretrained(
model_id,
quantization_config=quantization_config,
device_map="auto", # Automatically map layers to GPU/CPU
torch_dtype=torch.float16
)
# Save model and tokenizer
save_path = sys.argv[2]
model.save_pretrained(save_path)
tokenizer.save_pretrained(save_path)
- Downloads last month
- 4
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
馃檵
Ask for provider support
Model tree for yamatazen/ForgottenMaid-12B-bnb
Base model
yamatazen/ForgottenMaid-12B