BIOCLIP: A Vision Foundation Model for the Tree of Life Paper • 2311.18803 • Published Nov 30, 2023 • 1
LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models Paper • 2212.04088 • Published Dec 8, 2022
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics Paper • 2411.16537 • Published Nov 25, 2024
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents Paper • 2408.06327 • Published Aug 12, 2024 • 17