Batch: inefficient memory
#50
by
SinanAkkoyun
- opened
A batch size of 10 eats 40GB of VRAM!
VRAM Allocated: 3147.43 MB
VRAM Reserved: 39532.00 MB
model_name = "microsoft/Florence-2-large-ft"
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True).to('cuda')
processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
def generate_batch(prompts, images):
# Process inputs in batches
inputs = processor(text=prompts, images=images, return_tensors="pt").to('cuda')
generated_ids = model.generate(
input_ids=inputs["input_ids"],
pixel_values=inputs["pixel_values"],
max_new_tokens=1024,
do_sample=False,
num_beams=3
)
generated_texts = processor.batch_decode(generated_ids, skip_special_tokens=False)
parsed_answers = [
processor.post_process_generation(text, task="<OD>", image_size=(img.width, img.height))
for text, img in zip(generated_texts, images)
]
return parsed_answers, generated_ids
Even when I repurpose the colab exactly as-is, VRAM usage seems to linearly grow with batch size.
Any help is greatly appreciated
Wrap the call to model.generate()
into a torch.no_grad()
context manager:
with torch.no_grad():
generated_ids = model.generate(
input_ids=inputs["input_ids"],
pixel_values=inputs["pixel_values"],
max_new_tokens=1024,
do_sample=False,
num_beams=3,
)