Yuxuan Wang's picture

13 9 3

Yuxuan Wang

ColorfulAI

·

https://patrick-tssn.github.io/

patrick-tssn

AI & ML interests

Multimodal Learning

Organizations

authored a paper 3 months ago

Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space

Paper • 2505.13308 • Published May 19 • 27

authored a paper 5 months ago

OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts

Paper • 2503.22952 • Published Mar 29 • 18

authored a paper 6 months ago

From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens

Paper • 2502.18890 • Published Feb 26 • 30

authored 5 papers 7 months ago

VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic Understanding with Scene and Topic Transitions

Paper • 2305.18756 • Published May 30, 2023

Collaborative Reasoning on Multi-Modal Semantic Graphs for Video-Grounded Dialogue Generation

Paper • 2210.12460 • Published Oct 22, 2022

LSTP: Language-guided Spatial-Temporal Prompt Learning for Long-form Video-Text Understanding

Paper • 2402.16050 • Published Feb 25, 2024 • 1

Shuo Wen Jie Zi: Rethinking Dictionaries and Glyphs for Chinese Language Pre-training

Paper • 2305.18760 • Published May 30, 2023

LongViTU: Instruction Tuning for Long-Form Video Understanding

Paper • 2501.05037 • Published Jan 9 • 1

authored 3 papers 12 months ago

HawkEye: Training Video-Text LLMs for Grounding Text in Videos

Paper • 2403.10228 • Published Mar 15, 2024 • 1

VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges

Paper • 2409.01071 • Published Sep 2, 2024 • 28

ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning

Paper • 2408.02210 • Published Aug 5, 2024 • 9

authored 2 papers about 1 year ago

VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models

Paper • 2406.16338 • Published Jun 24, 2024 • 27

STAIR: Spatial-Temporal Reasoning with Auditable Intermediate Results for Video Question Answering

Paper • 2401.03901 • Published Jan 8, 2024