On Path to Multimodal Generalist: General-Level and General-Bench Paper • 2505.04620 • Published 9 days ago • 73 • 8
On Path to Multimodal Generalist: General-Level and General-Bench Paper • 2505.04620 • Published 9 days ago • 73 • 8
On Path to Multimodal Generalist: General-Level and General-Bench Paper • 2505.04620 • Published 9 days ago • 73 • 8
On Path to Multimodal Generalist: General-Level and General-Bench Paper • 2505.04620 • Published 9 days ago • 73 • 8
JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization Paper • 2503.23377 • Published Mar 30 • 55 • 4
JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization Paper • 2503.23377 • Published Mar 30 • 55 • 4
Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing Paper • 2412.19806 • Published Oct 8, 2024 • 1 • 5
Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing Paper • 2412.19806 • Published Oct 8, 2024 • 1 • 5
Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing Paper • 2412.19806 • Published Oct 8, 2024 • 1 • 5
Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing Paper • 2412.19806 • Published Oct 8, 2024 • 1 • 5
Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing Paper • 2412.19806 • Published Oct 8, 2024 • 1 • 5
RetrieveGPT: Merging Prompts and Mathematical Models for Enhanced Code-Mixed Information Retrieval Paper • 2411.04752 • Published Nov 7, 2024 • 17 • 3