7 20 28

Manan Shah

cs-mshah

https://cs-mshah.github.io/

AI & ML interests

Computer Vision

Recent Activity

upvoted a paper 6 days ago

Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents

upvoted a paper 6 days ago

StableAvatar: Infinite-Length Audio-Driven Avatar Video Generation

liked a model 17 days ago

Kwai-Keye/Keye-VL-1_5-8B

View all activity

Organizations

upvoted 2 papers 6 days ago

Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents

Paper • 2507.04009 • Published Jul 5 • 45

StableAvatar: Infinite-Length Audio-Driven Avatar Video Generation

Paper • 2508.08248 • Published Aug 11 • 27

upvoted an article 3 months ago

Article

Efficient MultiModal Data Pipeline

and 4 others •

Jul 8

• 55

upvoted an article 4 months ago

Article

GRPO for GUI Grounding Done Right

•

Jun 11

• 32

upvoted a paper 4 months ago

LightLab: Controlling Light Sources in Images with Diffusion Models

Paper • 2505.09608 • Published May 14 • 35

upvoted a collection 4 months ago

ViRFT Datasets

Collection

ViRFT Datasets • 8 items • Updated Feb 24 • 9

upvoted 4 papers 5 months ago

VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning

Paper • 2504.06958 • Published Apr 9 • 11

upvoted a paper 6 months ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 198

upvoted an article 6 months ago

Article

Open-Source Handwritten Signature Detection Model

•

Mar 14

• 119

upvoted a paper 6 months ago

Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models

Paper • 2503.06749 • Published Mar 9 • 31

upvoted an article 7 months ago

Article

SmolVLM2: Bringing Video Understanding to Every Device

and 6 others •

Feb 20

• 302

upvoted 3 papers about 1 year ago

Reflecting Reality: Enabling Diffusion Models to Produce Faithful Mirror Reflections

Paper • 2409.14677 • Published Sep 23, 2024 • 16

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 133

BRAT: Bonus oRthogonAl Token for Architecture Agnostic Textual Inversion

Paper • 2408.04785 • Published Aug 8, 2024 • 9

upvoted a collection about 1 year ago

Perturbed Attention Guidance pipelines

Collection

Pipelines for Perturbed Attention Guidance with 🧨 library • 8 items • Updated Jun 26, 2024 • 6

upvoted 2 papers over 1 year ago

Scalable 3D Captioning with Pretrained Models

Paper • 2306.07279 • Published Jun 12, 2023 • 15

Reproducibility Study of CDUL: CLIP-Driven Unsupervised Learning for Multi-Label Image Classification

Paper • 2405.11574 • Published May 19, 2024 • 1

Manan Shah

AI & ML interests

Recent Activity

Organizations

cs-mshah's activity

Efficient MultiModal Data Pipeline

GRPO for GUI Grounding Done Right

Open-Source Handwritten Signature Detection Model

SmolVLM2: Bringing Video Understanding to Every Device