-
38
Llama 3.2V 11B Cot
💬Generate descriptions and answers by combining text and images
-
Xkev/Llama-3.2V-11B-cot
Image-Text-to-Text • 11B • Updated • 5.23k • 153 -
Xkev/LLaVA-CoT-100k
Viewer • Updated • 98.6k • 2.06k • 94 -
LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Paper • 2411.10440 • Published • 126
Guowei Xu PRO
Xkev
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
1 day ago
A Survey on Vision-Language-Action Models: An Action Tokenization
Perspective
upvoted
a
paper
about 1 month ago
UniWorld: High-Resolution Semantic Encoders for Unified Visual
Understanding and Generation
upvoted
a
paper
2 months ago
PHYBench: Holistic Evaluation of Physical Perception and Reasoning in
Large Language Models
Organizations
None yet