Shehan Munasinghe

shehan97

·

https://shehanmunasinghe.github.io/

AI & ML interests

Computer Vision, Multi-modal learning

Organizations

commented 2 papers over 1 year ago

VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos

Paper • 2411.04923 • Published Nov 7, 2024 • 23 •

VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos

Paper • 2411.04923 • Published Nov 7, 2024 • 23 •

New activity in MBZUAI/swiftformer-xs over 2 years ago

Adding `safetensors` variant of this model

#1 opened almost 3 years ago by