Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Sladwell 's Collections
KooFit
Agents
Multimodal
Deep Think

Multimodal

updated 3 days ago
Upvote
-

  • Visual Representation Alignment for Multimodal Large Language Models

    Paper • 2509.07979 • Published 19 days ago • 81

  • LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation

    Paper • 2509.05263 • Published 23 days ago • 10

  • Symbolic Graphics Programming with Large Language Models

    Paper • 2509.05208 • Published 23 days ago • 45

  • OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling

    Paper • 2509.12201 • Published 13 days ago • 102

  • Multimodal Reasoning for Science: Technical Report and 1st Place Solution to the ICML 2025 SeePhys Challenge

    Paper • 2509.06079 • Published 21 days ago • 6

  • Lost in Embeddings: Information Loss in Vision-Language Models

    Paper • 2509.11986 • Published 13 days ago • 25

  • PersonaX: Multimodal Datasets with LLM-Inferred Behavior Traits

    Paper • 2509.11362 • Published 14 days ago • 4

  • UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning

    Paper • 2509.11543 • Published 13 days ago • 46

  • MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and Outlook

    Paper • 2509.14142 • Published 11 days ago • 9

  • Qwen3-Omni Technical Report

    Paper • 2509.17765 • Published 6 days ago • 115
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs