Provided code snippet not working?

by alexpong - opened Jan 28

Discussion

alexpong

Jan 28

Result:

Loading checkpoint shards: 100%|████████████████████████| 5/5 [00:03<00:00, 1.30it/s]
['']

mmerino

Jan 28

I’ve prepared a quick demo script to help address the issues you’re experiencing. Please note that this is a rapid test, not a fully optimized solution. While it demonstrates the core functionality, I recommend reviewing it carefully and adapting it to your specific needs.

Also, remember to install the transformers library directly from GitHub, as the model requires the latest version:

pip install git+https://github.com/huggingface/transformers accelerate

Sample code:
import torch
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from qwen_vl_utils import process_vision_info

Load the pre-trained model for visual-language conditional generation.

Configure it to use FP16 precision and Flash Attention v2 for efficient computation.

Automatically map the model to available devices (e.g., GPU if available).

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
"Qwen/Qwen2.5-VL-7B-Instruct",
torch_dtype=torch.float16,
attn_implementation="flash_attention_2",
device_map="auto"
)

Ensure default tensor type matches torch.float16 to avoid type mismatches

torch.set_default_dtype(torch.float16)

Load the corresponding processor to handle tokenization, image, and video inputs.

processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct")

Prepare the input message structure. This includes an image URL and a user request to describe it.

messages = [
{
"role": "user",
"content": [
{
"type": "image",
"image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
},
{"type": "text", "text": "Describe this image."},
],
}
]

Process the textual component of the input message.

Apply the chat template, keeping the tokenization step deferred.

Add a generation prompt to guide the model's output generation.

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

Extract and process the visual information (images and videos) from the message.

image_inputs, video_inputs = process_vision_info(messages)

Create the input tensors required for the model.

This includes the processed text, images, and videos, with appropriate padding for batch processing.

Move the tensors to GPU for accelerated inference.

inputs = processor(
text=[text],
images=image_inputs,
videos=video_inputs,
padding=True,
return_tensors="pt"
).to("cuda")

Generate the model's response.

Limit the output to a maximum of 128 new tokens.

generated_ids = model.generate(**inputs, max_new_tokens=128)

Decode the generated IDs into human-readable text.

Skip special tokens and avoid cleaning up tokenization spaces for accuracy.

output_text = processor.batch_decode(
generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False
)

Print the generated description of the image.

print("Generated description:", output_text[0])

Hope that helps.

jaya-bharath

Jan 28

Hi, I ran above modified code, But i am stuck with below error. Can you please suggest on how to fix? I am using T4 GPUs of Colab

Loading checkpoint shards: 100%
5/5 [00:46<00:00, 6.60s/it]
WARNING:accelerate.big_modeling:Some parameters are on the meta device because they were offloaded to the cpu.

TypeError Traceback (most recent call last)
in <cell line: 0>()
55
56 # Inference: Generation of the output
---> 57 generated_ids = model.generate(**inputs, max_new_tokens=128)
58 # generated_ids_trimmed = [
59 # out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)

19 frames
/usr/local/lib/python3.11/dist-packages/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py in apply_rotary_pos_emb_flashatt(tensor, freqs)
164 cos = freqs.cos()
165 sin = freqs.sin()
--> 166 output = apply_rotary_emb(tensor_, cos, sin).type_as(tensor)
167 return output
168

TypeError: 'NoneType' object is not callable

alexpong

Feb 1

I found that code snippet works on 3b model but not work on 7b

TahirC

Feb 4

i am getting import error

ImportError: cannot import name 'Qwen2_5_VLForConditionalGeneration' from 'transformers' (/opt/conda/lib/python3.10/site-packages/transformers/init.py)

i did install from git using given command - pip install git+https://github.com/huggingface/transformers accelerate
transformers version - 4.49.0.dev0

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment

Provided code snippet not working?

Load the pre-trained model for visual-language conditional generation.

Configure it to use FP16 precision and Flash Attention v2 for efficient computation.

Automatically map the model to available devices (e.g., GPU if available).

Ensure default tensor type matches torch.float16 to avoid type mismatches

Load the corresponding processor to handle tokenization, image, and video inputs.

Prepare the input message structure. This includes an image URL and a user request to describe it.

Process the textual component of the input message.

Apply the chat template, keeping the tokenization step deferred.

Add a generation prompt to guide the model's output generation.

Extract and process the visual information (images and videos) from the message.

Create the input tensors required for the model.

This includes the processed text, images, and videos, with appropriate padding for batch processing.

Move the tensors to GPU for accelerated inference.

Generate the model's response.

Limit the output to a maximum of 128 new tokens.

Decode the generated IDs into human-readable text.

Skip special tokens and avoid cleaning up tokenization spaces for accuracy.

Print the generated description of the image.

Loading checkpoint shards: 100% 5/5 [00:46<00:00, 6.60s/it]WARNING:accelerate.big_modeling:Some parameters are on the meta device because they were offloaded to the cpu.

Loading checkpoint shards: 100%
5/5 [00:46<00:00, 6.60s/it]
WARNING:accelerate.big_modeling:Some parameters are on the meta device because they were offloaded to the cpu.