On Path to Multimodal Generalist: General-Level and General-Bench Paper • 2505.04620 • Published May 7 • 82
VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models Paper • 2504.13122 • Published Apr 17 • 21
Effi-Code: Unleashing Code Efficiency in Language Models Paper • 2410.10209 • Published Oct 14, 2024 • 2
Towards Multimodal Empathetic Response Generation: A Rich Text-Speech-Vision Avatar-based Benchmark Paper • 2502.04976 • Published Feb 7
NUS-Emo at SemEval-2024 Task 3: Instruction-Tuning LLM for Multimodal Emotion-Cause Analysis in Conversations Paper • 2501.17261 • Published Aug 22, 2024
PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis Paper • 2408.09481 • Published Aug 18, 2024 • 1