Shusheng Yang's picture

Shusheng Yang PRO

ShushengYang

·

https://shushengyang.com

AI & ML interests

computer vision, vision language model

Recent Activity

updated a dataset 30 days ago

nyu-visionx/VSI-590K-MetaInfo

published a dataset 30 days ago

nyu-visionx/VSI-590K-MetaInfo

upvoted a paper about 2 months ago

Beyond Language Modeling: An Exploration of Multimodal Pretraining

View all activity

Organizations

authored a paper 6 months ago

Cambrian-S: Towards Spatial Supersensing in Video

Paper • 2511.04670 • Published Nov 6, 2025 • 39

authored 5 papers over 1 year ago

Qwen Technical Report

Paper • 2309.16609 • Published Sep 28, 2023 • 38

ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers

Paper • 2305.15272 • Published May 24, 2023

TouchStone: Evaluating Vision-Language Models by Language Models

Paper • 2308.16890 • Published Aug 31, 2023 • 1

Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection

Paper • 2204.02964 • Published Apr 6, 2022

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

Paper • 2412.14171 • Published Dec 18, 2024 • 25

authored 2 papers almost 2 years ago

Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities

Paper • 2308.12966 • Published Aug 24, 2023 • 12

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

Paper • 2406.16860 • Published Jun 24, 2024 • 63