Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2503.13444

a collection of algorithmic agents for user interfaces/interactions, program synthesis, and robotics

about 7 hours ago

End-to-End Goal-Driven Web Navigation

Paper • 1602.02261 • Published Feb 6, 2016
Learning Language Games through Interaction

Paper • 1606.02447 • Published Jun 8, 2016
Naturalizing a Programming Language via Interactive Learning

Paper • 1704.06956 • Published Apr 23, 2017
Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration

Paper • 1802.08802 • Published Feb 24, 2018

Multimodal Reasoning

about 12 hours ago

InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning

Paper • 2502.11573 • Published Feb 17 • 8
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

Paper • 2502.02339 • Published Feb 4 • 22
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model

Paper • 2502.11775 • Published Feb 17 • 8
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 39

VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning

about 19 hours ago

yeliudev/VideoMind-2B

Updated 3 days ago • 3
yeliudev/VideoMind-7B

Updated 2 days ago • 2
yeliudev/VideoMind-2B-FT-QVHighlights

Updated about 19 hours ago
yeliudev/VideoMind-Dataset

Preview • Updated about 19 hours ago • 111

MLLM-as-a-Judge for Image Safety without Human Labeling

Paper • 2501.00192 • Published Dec 31, 2024 • 28
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1 • 101
Xmodel-2 Technical Report

Paper • 2412.19638 • Published Dec 27, 2024 • 26
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Paper • 2412.18925 • Published Dec 25, 2024 • 98

WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens

Paper • 2401.09985 • Published Jan 18, 2024 • 17
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects

Paper • 2401.09962 • Published Jan 18, 2024 • 9
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution

Paper • 2401.10404 • Published Jan 18, 2024 • 10
ActAnywhere: Subject-Aware Video Background Generation

Paper • 2401.10822 • Published Jan 19, 2024 • 13

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs