Harpreet Sahota's picture

Harpreet Sahota PRO

harpreetsahota

·

AI & ML interests

Deep learning, laguage models, prompt engineering, agents, multi-agent systems

Recent Activity

updated a dataset about 11 hours ago

harpreetsahota/flickr_faces_res512_5k

published a dataset about 11 hours ago

harpreetsahota/flickr_faces_res512_5k

updated a dataset about 13 hours ago

harpreetsahota/celebamask-hq

View all activity

Organizations

upvoted a collection 4 days ago

ModernVBERT

Resources for ModernVBERT • 5 items • Updated 5 days ago • 10

upvoted a collection 14 days ago

Qwen3-VL

9 items • Updated 6 days ago • 161

upvoted an article 16 days ago

Article

Vision Language Model Alignment in TRL ⚡️

By

and 4 others •

Aug 7

• 93

upvoted a collection 21 days ago

Granite Docling

4 items • Updated 1 day ago • 49

upvoted an article 27 days ago

Article

PP-OCRv5 on Hugging Face: A Specialized Approach to OCR

By

and 5 others •

28 days ago

• 103

upvoted a collection 28 days ago

PP-OCRv5

PP-OCRv5 is the latest text recognition solution, supporting Simplified Chinese, Chinese Pinyin, Traditional Chinese, English, and Japanese • 13 items • Updated 23 days ago • 46

upvoted a collection about 2 months ago

UI-Venus

7 items • Updated 8 days ago • 21

upvoted 2 collections 2 months ago

Releases July 25

28 items • Updated Jul 30 • 3

Releases July 18

34 items • Updated Jul 23 • 4

upvoted an article 3 months ago

Article

Welcome the NVIDIA Llama Nemotron Nano VLM to Hugging Face Hub

By

and 11 others •

Jun 27

• 28

upvoted a collection 4 months ago

V-JEPA 2

A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann • 8 items • Updated Jun 13 • 164

upvoted an article 4 months ago

Article

ScreenSuite - The most comprehensive evaluation suite for GUI Agents!

Jun 6

• 54

upvoted 3 collections 4 months ago

Holo1

Vision-Language Action Model for use in Surfer-H web navigation agent • 6 items • Updated Jun 10 • 48

AGUVIS: Unified Pure Vision GUI Agents

https://aguvis-project.github.io • 3 items • Updated Dec 20, 2024 • 7

MiMo-VL

6 items • Updated Aug 21 • 37

upvoted a collection 5 months ago

MiniCPM-o & MiniCPM-V

Multimodal models with leading performance. • 28 items • Updated Sep 1 • 54

upvoted an article 5 months ago

Article

Vision Language Models (Better, Faster, Stronger)

By

and 4 others •

May 12

• 538

upvoted a collection 6 months ago

April 11 Releases

22 items • Updated Apr 16 • 7

upvoted a collection 7 months ago

video-effects datasets

Smol datasets to emulate cool video effects like "squish", "dissolve", etc. Inspired by Pika effects. • 4 items • Updated Jan 28 • 4

upvoted a collection 10 months ago

AIMv2

A collection of AIMv2 vision encoders that supports a number of resolutions, native resolution, and a distilled checkpoint. • 19 items • Updated Aug 25 • 82