1 4 2

baishanduan

browallia

browallia

AI & ML interests

None yet

Recent Activity

liked a Space about 2 months ago

mteb/leaderboard

upvoted a paper 7 months ago

The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models

commented on an article 9 months ago

SigLIP 2: A better multilingual vision language encoder

View all activity

Organizations

liked a Space about 2 months ago

MTEB Leaderboard

🥇

6.87k

Embedding Leaderboard

upvoted a paper 7 months ago

The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models

Paper • 2505.22617 • Published May 28 • 131

commented on SigLIP 2: A better multilingual vision language encoder 9 months ago

Just wanted to add that I found the github readme on transformers that shows how to perform pre and post-processing yourself using AutoModel and AutoProcessor. I noted that this example performs torch.sigmoid on the raw model output, which leaves the values looking similar to how they look when running it via the pipeline.

Following the github example almost exactly with my dog image I can see, for labels:
man, cat, horse, dog (with template This is a photo of a xxxx.)
Raw logits: tensor([[-16.1657, -14.3962, -15.7023, -7.3122]])
Post sigmoid: tensor([[9.5352e-08, 5.5954e-07, 1.5156e-07, 6.6692e-04]])
0.0% that image 0 is 'man'
0.0% that image 0 is 'cat'
0.0% that image 0 is 'horse'
0.1% that image 0 is 'dog'

did u fix it? i meet the same error

commented on SigLIP 2: A better multilingual vision language encoder 9 months ago

cross modal similarity

when i input an img and a text to model

def get_output(url, text):
    text = "a photo of 2 cats"
    url = "http://images.cocodataset.org/val2017/000000039769.jpg"
    image = load_image(url)
    inputs = processor(text=[text], images=image, padding="max_length", return_tensors="pt").to(model.device)
    with torch.no_grad():
        output = model(**inputs)
    return output

the output logits is
SiglipOutput(loss=None, logits_per_image=tensor([[-15.5217]], device='cuda:0'), logits_per_text=tensor([[-15.5217]], device='cuda:0')

after sigmoid the output is ~0.

How can i fix it?

New activity in google/siglip2-base-patch16-224 9 months ago