DiffusionRet: Generative Text-Video Retrieval with Diffusion Model Paper • 2303.09867 • Published Mar 17, 2023
Multi-granularity Interaction Simulation for Unsupervised Interactive Segmentation Paper • 2303.13399 • Published Mar 23, 2023
Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning Paper • 2303.14369 • Published Mar 25, 2023
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment Paper • 2310.01852 • Published Oct 3, 2023 • 2
HiFi-123: Towards High-fidelity One Image to 3D Content Generation Paper • 2310.06744 • Published Oct 10, 2023 • 2
Progressive3D: Progressively Local Editing for Text-to-3D Content Creation with Complex Semantic Prompts Paper • 2310.11784 • Published Oct 18, 2023 • 11
Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment Paper • 2305.12218 • Published May 20, 2023
Album Storytelling with Iterative Story-aware Captioning and Large Language Models Paper • 2305.12943 • Published May 22, 2023
ChatFace: Chat-Guided Real Face Editing via Diffusion Latent Space Manipulation Paper • 2305.14742 • Published May 24, 2023 • 1
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding Paper • 2311.08046 • Published Nov 14, 2023 • 2
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection Paper • 2311.10122 • Published Nov 16, 2023 • 27
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models Paper • 2311.16103 • Published Nov 27, 2023 • 1
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet Paper • 2101.11986 • Published Jan 28, 2021
Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting Paper • 2312.13271 • Published Dec 20, 2023 • 6
Machine Mindset: An MBTI Exploration of Large Language Models Paper • 2312.12999 • Published Dec 20, 2023 • 4
ChatLaw: Open-Source Legal Large Language Model with Integrated External Knowledge Bases Paper • 2306.16092 • Published Jun 28, 2023
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models Paper • 2401.15947 • Published Jan 29, 2024 • 51
PiCO: Peer Review in LLMs based on the Consistency Optimization Paper • 2402.01830 • Published Feb 2, 2024
ProLLaMA: A Protein Large Language Model for Multi-Task Protein Language Processing Paper • 2402.16445 • Published Feb 26, 2024