Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning Paper • 2507.16746 • Published 12 days ago • 32 • 1
Textual Steering Vectors Can Improve Visual Understanding in Multimodal Large Language Models Paper • 2505.14071 • Published May 20 • 1 • 2
FoNE: Precise Single-Token Number Embeddings via Fourier Features Paper • 2502.09741 • Published Feb 13 • 15 • 3
FoNE: Precise Single-Token Number Embeddings via Fourier Features Paper • 2502.09741 • Published Feb 13 • 15 • 3
VisualLens: Personalization through Visual History Paper • 2411.16034 • Published Nov 25, 2024 • 18 • 2
TLDR: Token-Level Detective Reward Model for Large Vision Language Models Paper • 2410.04734 • Published Oct 7, 2024 • 17 • 2