Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning Paper โข 2506.13654 โข Published Jun 16 โข 44
OtterHD: A High-Resolution Multi-modality Model Paper โข 2311.04219 โข Published Nov 7, 2023 โข 34
Octopus: Embodied Vision-Language Programmer from Environmental Feedback Paper โข 2310.08588 โข Published Oct 12, 2023 โข 38