Models - a JuanRafap Collection

InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU

Paper • 2502.08910 • Published Feb 13 • 149

Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning

Paper • 2505.01441 • Published Apr 28 • 39

Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning

Paper • 2505.07263 • Published May 12 • 30

MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining

Paper • 2505.07608 • Published May 12 • 81

Unified Continuous Generative Models

Paper • 2505.07447 • Published May 12 • 44

Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models

Paper • 2505.02847 • Published May 1 • 28

DeepCritic: Deliberate Critique with Large Language Models

Paper • 2505.00662 • Published May 1 • 55

OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning

Paper • 2505.04601 • Published May 7 • 27

LLM-Independent Adaptive RAG: Let the Question Speak for Itself

Paper • 2505.04253 • Published May 7 • 13

ZeroSearch: Incentivize the Search Capability of LLMs without Searching

Paper • 2505.04588 • Published May 7 • 66

Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Paper • 2505.03335 • Published May 6 • 178

Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL

Paper • 2505.02391 • Published May 5 • 25

FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models

Paper • 2505.02735 • Published May 5 • 32

Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning

Paper • 2505.03318 • Published May 6 • 94

Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts

Paper • 2504.21117 • Published Apr 29 • 26

NodeRAG: Structuring Graph-based RAG with Heterogeneous Nodes

Paper • 2504.11544 • Published Apr 15 • 42

Analyzing LLMs' Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations

Paper • 2504.13816 • Published Apr 18 • 17

System Prompt Optimization with Meta-Learning

Paper • 2505.09666 • Published May 14 • 72

Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models

Paper • 2505.10554 • Published May 15 • 120

TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published Apr 22 • 118

PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models

Paper • 2504.16074 • Published Apr 22 • 36

OTC: Optimal Tool Calls via Reinforcement Learning

Paper • 2504.14870 • Published Apr 21 • 33

I-Con: A Unifying Framework for Representation Learning

Paper • 2504.16929 • Published Apr 23 • 30

WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents

Paper • 2504.15785 • Published Apr 22 • 19

OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning

Paper • 2505.08617 • Published May 13 • 42

Chain-of-Model Learning for Language Model

Paper • 2505.11820 • Published May 17 • 120

Visual Agentic Reinforcement Fine-Tuning

Paper • 2505.14246 • Published May 20 • 32

Improving Assembly Code Performance with Large Language Models via Reinforcement Learning

Paper • 2505.11480 • Published May 16 • 8

AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning

Paper • 2505.11896 • Published May 17 • 58

Simple Semi-supervised Knowledge Distillation from Vision-Language Models via texttt{D}ual-texttt{H}ead texttt{O}ptimization

Paper • 2505.07675 • Published May 12 • 20

nvidia/parakeet-tdt-0.6b-v2

Automatic Speech Recognition • Updated 22 days ago • 1.32M • 1.23k

PrimeIntellect/INTELLECT-2

33B • Updated May 13 • 1.6k • 198

PrimeIntellect/INTELLECT-2-RL-Dataset

Viewer • Updated May 13 • 285k • 949 • 62

allenai/olmOCR-7B-0225-preview

Image-to-Text • 8B • Updated Feb 25 • 117k • 682

JetBrains/Mellum-4b-sft-python

Text Generation • 4B • Updated May 19 • 7.44k • • 43

JetBrains/Mellum-4b-base

Text Generation • 4B • Updated May 7 • 20.9k • • 401

ibm-granite/granite-4.0-tiny-base-preview

Text Generation • 7B • Updated May 6 • 1.88k • 19

ibm-granite/granite-4.0-tiny-preview

Text Generation • 7B • Updated May 6 • 44k • 140

reasonir/ReasonIR-8B

Feature Extraction • 8B • Updated May 13 • 3.35k • 51

ServiceNow-AI/Apriel-Nemotron-15b-Thinker

Text Generation • 15B • Updated May 15 • 6.31k • 89

Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning

Paper • 2505.13866 • Published May 20 • 16

facebook/KernelLLM

Text Generation • 8B • Updated 16 days ago • 1.53k • • 166

ZeroGUI: Automating Online GUI Learning at Zero Human Cost

Paper • 2505.23762 • Published May 29 • 46

reducto/RolmOCR

Image-to-Text • 8B • Updated Apr 2 • 158k • 454

Large Language Models are Locally Linear Mappings

Paper • 2505.24293 • Published May 30 • 15

nvidia/Nemotron-Research-Reasoning-Qwen-1.5B

Text Generation • 2B • Updated Jun 5 • 12.4k • • 176

rStar-Coder: Scaling Competitive Code Reasoning with a Large-Scale Verified Dataset

Paper • 2505.21297 • Published May 27 • 30

Qwen/Qwen3-Reranker-8B

Text Ranking • 8B • Updated Jun 9 • 19.8k • 123

lerobot/smolvla_base

Robotics • Updated 25 days ago • 13.2k • 207

DiffDecompose: Layer-Wise Decomposition of Alpha-Composited Images via Diffusion Transformers

Paper • 2505.21541 • Published May 24 • 7

UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning

Paper • 2505.23380 • Published May 29 • 23

Interleaved Reasoning for Large Language Models via Reinforcement Learning

Paper • 2505.19640 • Published May 26 • 13

nvidia/Llama-3.1-Nemotron-Nano-4B-v1.1

Text Generation • 5B • Updated Jun 4 • 5.72k • 94

Reasoning Model is Stubborn: Diagnosing Instruction Overriding in Reasoning Models

Paper • 2505.17225 • Published May 22 • 65

nvidia/AceReason-Nemotron-14B

Text Generation • 15B • Updated Jun 17 • 18.2k • • 84

MMaDA: Multimodal Large Diffusion Language Models

Paper • 2505.15809 • Published May 21 • 92

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

Paper • 2504.20571 • Published Apr 29 • 97

open-thoughts/OpenThinker3-7B

Text Generation • 8B • Updated Jun 9 • 24.3k • • 116

VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos

Paper • 2506.05349 • Published Jun 5 • 24

15

Leaderboard: Physical Reasoning from Video

🏃

Submit and score model predictions for video and text tasks

Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models

Paper • 2506.06395 • Published Jun 5 • 128

STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis

Paper • 2506.06276 • Published Jun 6 • 22

nvidia/OpenCodeReasoning-Nemotron-32B

Text Generation • 33B • Updated May 7 • 1.07k • 71

nvidia/OpenCodeReasoning-Nemotron-32B-IOI

Text Generation • 33B • Updated May 7 • 1.01k • • 24

ComfyUI-R1: Exploring Reasoning Models for Workflow Generation

Paper • 2506.09790 • Published Jun 11 • 52

Reasoning with Exploration: An Entropy Perspective

Paper • 2506.14758 • Published Jun 17 • 28

Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs

Paper • 2506.14731 • Published Jun 17 • 9

Marrying Autoregressive Transformer and Diffusion with Multi-Reference Autoregression

Paper • 2506.09482 • Published Jun 11 • 46

stepfun-ai/Step-Audio-AQAA

137B • Updated Jun 12 • 33 • 35

ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs

Paper • 2506.10128 • Published Jun 11 • 23

MiniMaxAI/MiniMax-M1-80k

Text Generation • 456B • Updated 12 days ago • 26.5k • • 662

Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence

Paper • 2506.15677 • Published about 1 month ago • 24

nvidia/Cosmos-Predict2-2B-Sample-Action-Conditioned

Updated Jun 17 • 356 • 2

LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs

Paper • 2506.14429 • Published Jun 17 • 44

Show-o2: Improved Native Unified Multimodal Models

Paper • 2506.15564 • Published about 1 month ago • 29

mistralai/Mistral-Small-3.2-24B-Instruct-2506

Image-Text-to-Text • 24B • Updated 11 days ago • 126k • 367

MoTE: Mixture of Ternary Experts for Memory-efficient Large Multimodal Models

Paper • 2506.14435 • Published Jun 17 • 8

THU-KEG/LongWriter-Zero-32B

Text Generation • 33B • Updated 16 days ago • 1.95k • • 102

Menlo/Jan-nano-128k

Text Generation • 4B • Updated 18 days ago • 6.32k • • 200

tencent/Hunyuan-A13B-Instruct-FP8

Text Generation • 80B • Updated 11 days ago • 2.3k • 30

SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation

Paper • 2506.18349 • Published 26 days ago • 12

Unified Vision-Language-Action Model

Paper • 2506.19850 • Published 24 days ago • 26

Arch-Router: Aligning LLM Routing with Human Preferences

Paper • 2506.16655 • Published 29 days ago • 10

Gryphe/Codex-24B-Small-3.2

24B • Updated 24 days ago • 1.02k • 40

SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning

Paper • 2506.19767 • Published 24 days ago • 13

LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning

Paper • 2506.18841 • Published 25 days ago • 56

Reward-Reasoning/RRM-32B

33B • Updated May 21 • 161k • 8

ReCode: Updating Code API Knowledge with Reinforcement Learning

Paper • 2506.20495 • Published 23 days ago • 8

Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models

Paper • 2506.19697 • Published 24 days ago • 44

tencent/Hunyuan-A13B-Instruct

Text Generation • 80B • Updated 11 days ago • 34k • 766

Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models

Paper • 2506.18945 • Published 26 days ago • 39

tngtech/DeepSeek-TNG-R1T2-Chimera

Text Generation • 685B • Updated 9 days ago • 5.27k • 213

openbmb/MiniCPM4-8B

Text Generation • 8B • Updated Jun 17 • 9.01k • 271

Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning

Paper • 2507.00432 • Published 18 days ago • 68

GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published 17 days ago • 188

HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context

Paper • 2506.21277 • Published 22 days ago • 15

UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoning and Understanding

Paper • 2506.23219 • Published 20 days ago • 7

Teaching a Language Model to Speak the Language of Tools

Paper • 2506.23394 • Published 19 days ago • 4

HuggingFaceTB/SmolLM3-3B-Base

Text Generation • 3B • Updated 8 days ago • 4.88k • 108

HuggingFaceTB/SmolLM3-3B

Text Generation • 3B • Updated 1 day ago • 176k • • 530

Lost in Latent Space: An Empirical Study of Latent Diffusion Models for Physics Emulation

Paper • 2507.02608 • Published 16 days ago • 21

DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation

Paper • 2506.20639 • Published 23 days ago • 27

Rethinking Verification for LLM Code Generation: From Generation to Testing

Paper • 2507.06920 • Published 9 days ago • 28

GTA1: GUI Test-time Scaling Agent

Paper • 2507.05791 • Published 11 days ago • 25

VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual Documents

Paper • 2507.04590 • Published 12 days ago • 16

dphn/Dolphin-Mistral-24B-Venice-Edition

24B • Updated 30 days ago • 2.54k • 76

One Token to Fool LLM-as-a-Judge

Paper • 2507.08794 • Published 7 days ago • 29

Robust Multimodal Large Language Models Against Modality Conflict

Paper • 2507.07151 • Published 10 days ago • 5

Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation

Paper • 2507.10524 • Published 4 days ago • 51

BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity

Paper • 2507.08771 • Published 7 days ago • 7

Scaling Laws for Optimal Data Mixtures

Paper • 2507.09404 • Published 6 days ago • 30

A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning

Paper • 2507.08267 • Published 8 days ago • 8

KV Cache Steering for Inducing Reasoning in Small Language Models

Paper • 2507.08799 • Published 7 days ago • 37

EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and Reasoning Modes

Paper • 2507.11407 • Published 3 days ago • 40

Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs

Paper • 2507.09477 • Published 6 days ago • 59