1 214 734

Motoki Wu PRO

tokestermw

https://motoki.co

AI & ML interests

None yet

Recent Activity

upvoted a paper 1 day ago

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

upvoted a paper 7 days ago

R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning

upvoted a paper 13 days ago

Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning

View all activity

Organizations

upvoted a paper 1 day ago

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Paper • 2509.02547 • Published 6 days ago • 162

upvoted a paper 7 days ago

R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning

Paper • 2508.21113 • Published 11 days ago • 105

upvoted 2 papers 13 days ago

Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning

Paper • 2508.16949 • Published 16 days ago • 22

AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Paper • 2508.16153 • Published 17 days ago • 133

upvoted a collection 19 days ago

NVIDIA Nemotron

Collection

Open, Production-ready Enterprise Models. Nvidia Open Model license. • 4 items • Updated 5 days ago • 56

upvoted a paper 20 days ago

SSRL: Self-Search Reinforcement Learning

Paper • 2508.10874 • Published 25 days ago • 91

upvoted a paper 24 days ago

Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models

Paper • 2508.10751 • Published 25 days ago • 27

upvoted 3 papers 26 days ago

On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification

Paper • 2508.05629 • Published Aug 7 • 170

Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments

Paper • 2508.08791 • Published 27 days ago • 16

Efficient Agents: Building Effective Agents While Reducing Cost

Paper • 2508.02694 • Published Jul 24 • 85

upvoted a paper 30 days ago

Agent Lightning: Train ANY AI Agents with Reinforcement Learning

Paper • 2508.03680 • Published Aug 5 • 67

upvoted 4 papers about 1 month ago

Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training

Paper • 2508.00414 • Published Aug 1 • 89

Beyond the Trade-off: Self-Supervised Reinforcement Learning for Reasoning Models' Instruction Following

Paper • 2508.02150 • Published Aug 4 • 36

CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning

Paper • 2507.14111 • Published Jul 18 • 23

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26 • 148

upvoted 4 papers about 2 months ago

RefCritic: Training Long Chain-of-Thought Critic Models with Refinement Feedback

Paper • 2507.15024 • Published Jul 20 • 13

Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities

Paper • 2507.13158 • Published Jul 17 • 24

SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?

Paper • 2507.12415 • Published Jul 16 • 41

Replacing thinking with tool usage enables reasoning in small language models

Paper • 2507.05065 • Published Jul 7 • 15

upvoted an article 2 months ago

Article

Training and Finetuning Sparse Embedding Models with Sentence Transformers v5

and 1 other •

Jul 1

• 116

Motoki Wu PRO

AI & ML interests

Recent Activity

Organizations

tokestermw's activity

Training and Finetuning Sparse Embedding Models with Sentence Transformers v5