Arkajyoti Mitra

aeros93

AI & ML interests

Deep Learning, Computer Vision, Vision Language Models, Diffusion, Gaussian Splatting

Recent Activity

upvoted an article 17 days ago

cocogold: training Marigold for text-grounded segmentation

upvoted an article about 1 month ago

Efficient MultiModal Data Pipeline

upvoted an article about 2 months ago

SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data

View all activity

Organizations

upvoted an article 17 days ago

Article

cocogold: training Marigold for text-grounded segmentation

•

Jul 8

• 30

upvoted an article about 1 month ago

Article

Efficient MultiModal Data Pipeline

and 4 others •

Jul 8

• 53

upvoted an article about 2 months ago

Article

SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data

and 8 others •

Jun 3

• 231

upvoted a paper 7 months ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 241

upvoted 2 papers 9 months ago

CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models

Paper • 2411.18613 • Published Nov 27, 2024 • 59

LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models

Paper • 2411.09595 • Published Nov 14, 2024 • 78

upvoted 2 articles about 1 year ago

Article

Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models

and 2 others •

Jun 24, 2024

• 199

Article

PaliGemma – Google's Cutting-Edge Open Vision Language Model

and 2 others •

May 14, 2024

• 265

upvoted an article over 1 year ago

Article

Vision Language Models Explained

and 1 other •

Apr 11, 2024

• 437

upvoted a collection over 1 year ago

Vision Language Models Papers 🖼️💬📝

Collection

Papers about vision-language models, most important ones are on top of the list. • 27 items • Updated Apr 30, 2024 • 38

upvoted an article over 1 year ago

Article

seemore: Implement a Vision Language Model from Scratch

•

Jun 23, 2024

• 94

upvoted a paper over 1 year ago

LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes

Paper • 2311.13384 • Published Nov 22, 2023 • 53

Arkajyoti Mitra

AI & ML interests

Recent Activity

Organizations

aeros93's activity

cocogold: training Marigold for text-grounded segmentation

Efficient MultiModal Data Pipeline

SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data

Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models

PaliGemma – Google's Cutting-Edge Open Vision Language Model

Vision Language Models Explained

seemore: Implement a Vision Language Model from Scratch