Learning Video LLM with Streaming Speech Transcription at Scale (CVPR 2025)
Joya Chen
chenjoya
AI & ML interests
Video LLM
Recent Activity
upvoted a paper 3 days ago
Beyond Language Modeling: An Exploration of Multimodal Pretraining upvoted a paper 3 days ago
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders upvoted a paper 30 days ago
Olaf-World: Orienting Latent Actions for Video World Modeling