Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
PulYong 's Collections
AR Image Generation
ETC
Unified MLLM
Score Based Model

Unified MLLM

updated Dec 20, 2024

Unified model that generate Text, Image, Video

Upvote
-

  • TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation

    Paper • 2412.03069 • Published Dec 4, 2024 • 35

  • Are Emergent Abilities of Large Language Models a Mirage?

    Paper • 2304.15004 • Published Apr 28, 2023 • 6

  • Scaling Image Tokenizers with Grouped Spherical Quantization

    Paper • 2412.02632 • Published Dec 3, 2024 • 10

  • Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation

    Paper • 2410.13848 • Published Oct 17, 2024 • 35

  • VisionZip: Longer is Better but Not Necessary in Vision Language Models

    Paper • 2412.04467 • Published Dec 5, 2024 • 111

  • Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation

    Paper • 2412.04432 • Published Dec 5, 2024 • 16

  • Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

    Paper • 2412.14171 • Published Dec 18, 2024 • 24

  • Autoregressive Video Generation without Vector Quantization

    Paper • 2412.14169 • Published Dec 18, 2024 • 14
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs