Does the model support outputting embeddings for a single image?

#3
by YoloBird - opened

I want to cluster some images, so I need to output embeddings for the images. I have tried several times by myself but failed. The following are the error messages and the code.

from src.model import MMEBModel
from src.arguments import ModelArguments
from src.model_utils import load_processor
from PIL import Image
import torch

1. 初始化模型参数

model_args = ModelArguments(
model_name='/root/autodl-tmp/Qwen/Qwen2-VL-2B-Instruct',
checkpoint_path='/root/autodl-tmp/TIGER-Lab/VLM2Vec-Qwen2VL-2B',
pooling='last',
normalize=True,
model_backbone='qwen2_vl',
lora=True
)

processor = load_processor(model_args)
model = MMEBModel.load(model_args)
model = model.to('cuda', dtype=torch.bfloat16)
model.eval()

3. 加载单张图像

image = Image.open('figures/example.jpg').convert('RGB')

4. 预处理图像 (关键点:text 设为空字符串或 None)

inputs = processor(text="", images=image, return_tensors="pt")

5. 移动到 CUDA

inputs = {k: v.unsqueeze(0).to('cuda') for k, v in inputs.items()}

6. 模型前向,提取 embedding

with torch.no_grad():
image_embedding = model(qry=inputs)["qry_reps"] # 这是图像的 embedding 向量

print("Image Embedding Shape:", image_embedding.shape)
print("Image Embedding:", image_embedding)

RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.cuda.FloatTensor instead (while checking arguments for embedding)

The error message is on the last line. I'm sorry that the information I provided seems rather disorganized.

Hi @YoloBird , thanks for your interest in our work! Yes, it definitely works for single-image embedding. However, I believe you need to include the image special token in the text in step 4:

inputs = processor(text="<|image_pad|>", images=image, return_tensors="pt")
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment