Gaurang Bharti's picture

Gaurang Bharti PRO

gbharti

·

https://gaurangbharti.netlify.app/

AI & ML interests

GPTs, Computer Vision, NLP

Recent Activity

liked a dataset 9 days ago

Vchitect/ShotBench

liked a model 9 days ago

Vchitect/ShotVL-7B

upvoted a paper 10 days ago

VideoPrism: A Foundational Visual Encoder for Video Understanding

View all activity

Organizations

upvoted 2 papers 10 days ago

VideoPrism: A Foundational Visual Encoder for Video Understanding

Paper • 2402.13217 • Published Feb 20, 2024 • 32

Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations

Paper • 2506.18898 • Published 17 days ago • 30

upvoted 4 papers 2 months ago

Towards Understanding Camera Motions in Any Video

Paper • 2504.15376 • Published Apr 21 • 157

NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks

Paper • 2504.19854 • Published Apr 28 • 7

TesserAct: Learning 4D Embodied World Models

Paper • 2504.20995 • Published Apr 29 • 20

The Leaderboard Illusion

Paper • 2504.20879 • Published Apr 29 • 70

upvoted 2 papers 11 months ago

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Paper • 2408.10188 • Published Aug 19, 2024 • 53

FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance

Paper • 2408.08189 • Published Aug 15, 2024 • 17

upvoted a paper 12 months ago

MusiConGen: Rhythm and Chord Control for Transformer-Based Text-to-Music Generation

Paper • 2407.15060 • Published Jul 21, 2024 • 9

upvoted a collection about 1 year ago

VILA: On Pre-training for Visual Language Models

10 items • Updated Apr 17 • 54

upvoted 7 papers over 1 year ago

YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information

Paper • 2402.13616 • Published Feb 21, 2024 • 48

LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Paper • 2402.13753 • Published Feb 21, 2024 • 117

MusicRL: Aligning Music Generation to Human Preferences

Paper • 2402.04229 • Published Feb 6, 2024 • 17

Mixtral of Experts

Paper • 2401.04088 • Published Jan 8, 2024 • 160

EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision

Paper • 2311.02077 • Published Nov 3, 2023 • 16

Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects

Paper • 2211.02247 • Published Nov 4, 2022 • 4

Prometheus: Inducing Fine-grained Evaluation Capability in Language Models

Paper • 2310.08491 • Published Oct 12, 2023 • 55

upvoted 3 papers almost 2 years ago

How FaR Are Large Language Models From Agents with Theory-of-Mind?

Paper • 2310.03051 • Published Oct 4, 2023 • 35

Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation

Paper • 2309.16429 • Published Sep 28, 2023 • 11

Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition

Paper • 2309.15223 • Published Sep 26, 2023 • 21