Discussions - a rivasmig Collection

rivasmig 's Collections

Copy

VLMs

Methods

Utility

Discussions

updated 27 days ago

Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models

Paper • 2504.07951 • Published Apr 10 • 29
Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability

Paper • 2504.08003 • Published Apr 9 • 49
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models

Paper • 2504.11468 • Published Apr 10 • 29
Towards Learning to Complete Anything in Lidar

Paper • 2504.12264 • Published Apr 16 • 10
Antidistillation Sampling

Paper • 2504.13146 • Published Apr 17 • 61
A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis

Paper • 2504.12322 • Published Apr 11 • 28
Rethinking Reflection in Pre-Training

Paper • 2504.04022 • Published Apr 5 • 80
Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models

Paper • 2504.04823 • Published Apr 7 • 31
Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1)

Paper • 2504.03151 • Published Apr 4 • 14
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

Paper • 2504.01990 • Published Mar 31 • 299
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models

Paper • 2503.24235 • Published Mar 31 • 55
Entropy-Based Adaptive Weighting for Self-Training

Paper • 2503.23913 • Published Mar 31 • 4
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency

Paper • 2504.18589 • Published Apr 24 • 13
Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps

Paper • 2505.18675 • Published May 24 • 23
MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research

Paper • 2505.19955 • Published May 26 • 12
The Coverage Principle: A Framework for Understanding Compositional Generalization

Paper • 2505.20278 • Published May 26 • 7
Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications of Agentic AI

Paper • 2505.19443 • Published May 26 • 15
Scaling Test-time Compute for LLM Agents

Paper • 2506.12928 • Published Jun 15 • 61
OAgents: An Empirical Study of Building Effective Agents

Paper • 2506.15741 • Published Jun 17 • 36
PAROAttention: Pattern-Aware ReOrdering for Efficient Sparse and Quantized Attention in Visual Generation Models

Paper • 2506.16054 • Published Jun 19 • 60
VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning

Paper • 2506.09049 • Published Jun 10 • 36
ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation

Paper • 2506.18095 • Published Jun 22 • 65
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published 28 days ago • 202
Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact

Paper • 2507.00951 • Published 28 days ago • 22