Learning Video LLM with Streaming Speech Transcription at Scale (CVPR 2025)
Joya Chen PRO
chenjoya
AI & ML interests
Video LLM
Recent Activity
liked
a model
1 day ago
google/gemma-3n-E4B-it
upvoted
a
paper
2 days ago
HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context
upvoted
a
paper
12 days ago
Show-o2: Improved Native Unified Multimodal Models