29 253 34

Orr Zohar PRO

orrzohar

https://orrzohar.github.io

AI & ML interests

Large Multi-Modal Models, Foundation Models, Video Understanding

Recent Activity

upvoted a collection 4 days ago

JARVIS-VLA-v1

liked a model 8 days ago

HuggingFaceTB/SmolVLM-Instruct

liked a model 8 days ago

HuggingFaceTB/SmolVLM2-500M-Video-Instruct

View all activity

Organizations

orrzohar's activity

upvoted a collection 4 days ago

JARVIS-VLA-v1

Collection

Vision-Language-Action Models in Minecraft. • 4 items • Updated 4 days ago • 9

upvoted 3 papers 8 days ago

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

Paper • 2503.12605 • Published 10 days ago • 29

R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization

Paper • 2503.12937 • Published 9 days ago • 26

MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research

Paper • 2503.13399 • Published 9 days ago • 20

upvoted a paper 9 days ago

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Paper • 2503.11576 • Published 12 days ago • 76

upvoted a paper 11 days ago

Transformers without Normalization

Paper • 2503.10622 • Published 13 days ago • 136

upvoted 4 papers 12 days ago

Long Context Tuning for Video Generation

Paper • 2503.10589 • Published 13 days ago • 13

VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search

Paper • 2503.10582 • Published 13 days ago • 20

Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models

Paper • 2503.09669 • Published 14 days ago • 34

CoSTAast: Cost-Sensitive Toolpath Agent for Multi-turn Image Editing

Paper • 2503.10613 • Published 13 days ago • 73

upvoted a paper 13 days ago

VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary

Paper • 2503.09402 • Published 14 days ago • 6

upvoted a paper 14 days ago

Video Action Differencing

Paper • 2503.07860 • Published 16 days ago • 31

upvoted 3 papers 19 days ago

upvoted an article 22 days ago

Article

A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality

22 days ago

• 69

upvoted a collection 22 days ago

C4AI Aya Vision

Collection

Aya Vision is a state-of-the-art family of vision models that brings multimodal capabilities to 23 languages. • 5 items • Updated 22 days ago • 68

upvoted 2 papers 29 days ago

Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models

Paper • 2502.16033 • Published Feb 22 • 17

Audio-FLAN: A Preliminary Release

Paper • 2502.16584 • Published about 1 month ago • 35

upvoted a collection about 1 month ago

SigLIP2

Collection

36 items • Updated 14 days ago • 65