muzammal
's Collections
Papers to Read
updated
MLLM-as-a-Judge for Image Safety without Human Labeling
Paper
•
2501.00192
•
Published
•
28
2.5 Years in Class: A Multimodal Textbook for Vision-Language
Pretraining
Paper
•
2501.00958
•
Published
•
101
Xmodel-2 Technical Report
Paper
•
2412.19638
•
Published
•
26
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper
•
2412.18925
•
Published
•
98
CodeElo: Benchmarking Competition-level Code Generation of LLMs with
Human-comparable Elo Ratings
Paper
•
2501.01257
•
Published
•
52
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper
•
2501.08313
•
Published
•
276
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with
Large Language Models
Paper
•
2501.09686
•
Published
•
39
PaSa: An LLM Agent for Comprehensive Academic Paper Search
Paper
•
2501.10120
•
Published
•
48
GuardReasoner: Towards Reasoning-based LLM Safeguards
Paper
•
2501.18492
•
Published
•
84
WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in
Post-Training
Paper
•
2501.18511
•
Published
•
19
LIMO: Less is More for Reasoning
Paper
•
2502.03387
•
Published
•
60
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time
Scaling
Paper
•
2502.06703
•
Published
•
144
Expect the Unexpected: FailSafe Long Context QA for Finance
Paper
•
2502.06329
•
Published
•
127
TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation
Paper
•
2502.07870
•
Published
•
43
LLMs Can Easily Learn to Reason from Demonstrations Structure, not
content, is what matters!
Paper
•
2502.07374
•
Published
•
37
Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance
Paper
•
2502.08127
•
Published
•
52
BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large
Language Models
Paper
•
2502.07346
•
Published
•
51
TransMLA: Multi-head Latent Attention Is All You Need
Paper
•
2502.07864
•
Published
•
47
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of
Video Foundation Model
Paper
•
2502.10248
•
Published
•
53
SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance
Software Engineering?
Paper
•
2502.12115
•
Published
•
43
Magma: A Foundation Model for Multimodal AI Agents
Paper
•
2502.13130
•
Published
•
57
Qwen2.5-VL Technical Report
Paper
•
2502.13923
•
Published
•
169
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Paper
•
2502.14499
•
Published
•
184
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic
Understanding, Localization, and Dense Features
Paper
•
2502.14786
•
Published
•
134
S*: Test Time Scaling for Code Generation
Paper
•
2502.14382
•
Published
•
61
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines
Paper
•
2502.14739
•
Published
•
98
MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale
Reinforcement Learning
Paper
•
2503.07365
•
Published
•
54
Token-Efficient Long Video Understanding for Multimodal LLMs
Paper
•
2503.04130
•
Published
•
84
R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model
Paper
•
2503.05132
•
Published
•
51
Visual-RFT: Visual Reinforcement Fine-Tuning
Paper
•
2503.01785
•
Published
•
70
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language
Models via Mixture-of-LoRAs
Paper
•
2503.01743
•
Published
•
77
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through
Two-Stage Rule-Based RL
Paper
•
2503.07536
•
Published
•
80
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural
Vision-Language Dataset for Southeast Asia
Paper
•
2503.07920
•
Published
•
95
Unified Reward Model for Multimodal Understanding and Generation
Paper
•
2503.05236
•
Published
•
107
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers
Paper
•
2503.11579
•
Published
•
17
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model
for Visual Generation and Editing
Paper
•
2503.10639
•
Published
•
45
R1-Onevision: Advancing Generalized Multimodal Reasoning through
Cross-Modal Formalization
Paper
•
2503.10615
•
Published
•
16
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
Paper
•
2503.10291
•
Published
•
32
MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based
Scientific Research
Paper
•
2503.13399
•
Published
•
20
V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning
Paper
•
2503.11495
•
Published
•
11
VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning
Paper
•
2503.13444
•
Published
•
13
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM
Paper
•
2503.14478
•
Published
•
41
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs
for Knowledge-Intensive Visual Grounding
Paper
•
2503.12797
•
Published
•
28
DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal
Consistent Video Generation
Paper
•
2503.06053
•
Published
•
84
TULIP: Towards Unified Language-Image Pretraining
Paper
•
2503.15485
•
Published
•
42
Stop Overthinking: A Survey on Efficient Reasoning for Large Language
Models
Paper
•
2503.16419
•
Published
•
57