UniTok: A Unified Tokenizer for Visual Generation and Understanding Paper • 2502.20321 • Published 13 days ago • 29
LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning Paper • 2503.04812 • Published 9 days ago • 12