Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
leondawn666 's Collections
UI Agent
Embodied Intelligence
Multimodality
Agent & RL
Finance AI

Multimodality

updated Aug 20
Upvote
-

  • Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers

    Paper • 2506.23918 • Published Jun 30 • 88

  • LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale

    Paper • 2504.16030 • Published Apr 22 • 37

  • Time Blindness: Why Video-Language Models Can't See What Humans Can?

    Paper • 2505.24867 • Published May 30 • 80

  • GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

    Paper • 2507.01006 • Published Jul 1 • 236

  • Scaling RL to Long Videos

    Paper • 2507.07966 • Published Jul 10 • 157

  • Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory

    Paper • 2508.09736 • Published Aug 13 • 54

  • We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning

    Paper • 2508.10433 • Published Aug 14 • 143

  • Thyme: Think Beyond Images

    Paper • 2508.11630 • Published Aug 15 • 80

  • DINOv3

    Paper • 2508.10104 • Published Aug 13 • 264

  • Ovis2.5 Technical Report

    Paper • 2508.11737 • Published Aug 15 • 109
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs