5 198 2

Bhimraj Yadav

bhimrazy

https://bhimraj.com.np

AI & ML interests

Computer Vision, Healthcare, Generative AI and NLP

Recent Activity

upvoted a paper 8 days ago

FastVLM: Efficient Vision Encoding for Vision Language Models

upvoted a paper 8 days ago

MobileCLIP2: Improving Multi-Modal Reinforced Training

upvoted a paper 8 days ago

VibeVoice Technical Report

View all activity

Organizations

upvoted 4 papers 8 days ago

MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers

Paper • 2508.20453 • Published 13 days ago • 58

upvoted an article 4 months ago

Article

Vision Language Models (Better, Faster, Stronger)

and 4 others •

May 12

• 523

upvoted a paper 5 months ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 199

upvoted 2 papers 6 months ago

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26 • 165

Transformers without Normalization

Paper • 2503.10622 • Published Mar 13 • 170

upvoted 2 articles 6 months ago

Article

A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality

and 3 others •

Mar 4

• 75

Article

The Beginners Guide to Cleaning a Dataset

•

Nov 18, 2024

• 24

upvoted 10 papers 7 months ago

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 203

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published Feb 20 • 147

SearchRAG: Can Search Engines Be Helpful for LLM-based Medical Question Answering?

Paper • 2502.13233 • Published Feb 18 • 15

Craw4LLM: Efficient Web Crawling for LLM Pretraining

Paper • 2502.13347 • Published Feb 19 • 29

Magma: A Foundation Model for Multimodal AI Agents

Paper • 2502.13130 • Published Feb 18 • 58

Baichuan-M1: Pushing the Medical Capability of Large Language Models

Paper • 2502.12671 • Published Feb 18 • 1

Scaling Test-Time Compute Without Verification or RL is Suboptimal

Paper • 2502.12118 • Published Feb 17 • 1

Soundwave: Less is More for Speech-Text Alignment in LLMs

Paper • 2502.12900 • Published Feb 18 • 86

Is Noise Conditioning Necessary for Denoising Generative Models?

Paper • 2502.13129 • Published Feb 18 • 1

ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario

Paper • 2501.10132 • Published Jan 17 • 22

Bhimraj Yadav

AI & ML interests

Recent Activity

Organizations

bhimrazy's activity

Vision Language Models (Better, Faster, Stronger)

A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality

The Beginners Guide to Cleaning a Dataset