Post
1094
stop using VLMs blindly βπ»
compare different VLM outputs on a huge variety of inputs (from reasoning to OCR!) π₯ visionLMsftw/comparevlms
> has support for multiple VLMs: google/gemma-3-27b-it, Qwen/Qwen2.5-VL-7B-Instruct, Qwen/Qwen2.5-VL-32B-Instruct, meta-llama/Llama-4-Maverick-17B-128E-Instruct, HuggingFaceTB/SmolVLM2-2.2B-Instruct
> recommend us new models or inputs, we'll add π«‘
so far I figured out
> for fact-checks, you need a relatively bigger size (7B is ok!)
> Gemma 3 gets downgrade without pan and scan (especially for π)
> Qwen2.5VL-32B is very talkative, great for reasoning but not good for simple tasks π£οΈ
compare different VLM outputs on a huge variety of inputs (from reasoning to OCR!) π₯ visionLMsftw/comparevlms
> has support for multiple VLMs: google/gemma-3-27b-it, Qwen/Qwen2.5-VL-7B-Instruct, Qwen/Qwen2.5-VL-32B-Instruct, meta-llama/Llama-4-Maverick-17B-128E-Instruct, HuggingFaceTB/SmolVLM2-2.2B-Instruct
> recommend us new models or inputs, we'll add π«‘
so far I figured out
> for fact-checks, you need a relatively bigger size (7B is ok!)
> Gemma 3 gets downgrade without pan and scan (especially for π)
> Qwen2.5VL-32B is very talkative, great for reasoning but not good for simple tasks π£οΈ