yl2488's picture

1 2 2

yl2488

yl2488

[email protected]

AI & ML interests

None yet

Recent Activity

authored a paper 3 days ago

DiffusionRet: Generative Text-Video Retrieval with Diffusion Model

authored a paper 3 days ago

Multi-granularity Interaction Simulation for Unsupervised Interactive Segmentation

authored a paper 3 days ago

Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning

View all activity

Organizations

None yet

yl2488's activity

authored 20 papers 3 days ago

DiffusionRet: Generative Text-Video Retrieval with Diffusion Model

Paper • 2303.09867 • Published Mar 17, 2023

Multi-granularity Interaction Simulation for Unsupervised Interactive Segmentation

Paper • 2303.13399 • Published Mar 23, 2023

Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning

Paper • 2303.14369 • Published Mar 25, 2023

LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

Paper • 2310.01852 • Published Oct 3, 2023 • 2

HiFi-123: Towards High-fidelity One Image to 3D Content Generation

Paper • 2310.06744 • Published Oct 10, 2023 • 2

Progressive3D: Progressively Local Editing for Text-to-3D Content Creation with Complex Semantic Prompts

Paper • 2310.11784 • Published Oct 18, 2023 • 11

Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment

Paper • 2305.12218 • Published May 20, 2023

Album Storytelling with Iterative Story-aware Captioning and Large Language Models

Paper • 2305.12943 • Published May 22, 2023

ChatFace: Chat-Guided Real Face Editing via Diffusion Latent Space Manipulation

Paper • 2305.14742 • Published May 24, 2023 • 1

Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

Paper • 2311.08046 • Published Nov 14, 2023 • 2

Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Paper • 2311.10122 • Published Nov 16, 2023 • 27

Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models

Paper • 2311.16103 • Published Nov 27, 2023 • 1

Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

Paper • 2101.11986 • Published Jan 28, 2021

Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting

Paper • 2312.13271 • Published Dec 20, 2023 • 6

Machine Mindset: An MBTI Exploration of Large Language Models

Paper • 2312.12999 • Published Dec 20, 2023 • 4

VOLO: Vision Outlooker for Visual Recognition

Paper • 2106.13112 • Published Jun 24, 2021

ChatLaw: Open-Source Legal Large Language Model with Integrated External Knowledge Bases

Paper • 2306.16092 • Published Jun 28, 2023

MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

Paper • 2401.15947 • Published Jan 29, 2024 • 51

PiCO: Peer Review in LLMs based on the Consistency Optimization

Paper • 2402.01830 • Published Feb 2, 2024

ProLLaMA: A Protein Large Language Model for Multi-Task Protein Language Processing

Paper • 2402.16445 • Published Feb 26, 2024