Dongfu Jiang's picture

Dongfu Jiang

DongfuJiang

·

https://jdf-prog.github.io/

AI & ML interests

Large Language Model, Modality Reasoning and their evaluation

Recent Activity

liked a dataset about 7 hours ago

agentica-org/DeepScaleR-Preview-Dataset

upvoted a paper 3 days ago

Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions?

commented on a paper 3 days ago

VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use

View all activity

Organizations

upvoted a paper 3 days ago

Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions?

Paper • 2509.04292 • Published 4 days ago • 48

upvoted 4 papers 5 days ago

UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning

Paper • 2509.02544 • Published 5 days ago • 106

SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning

Paper • 2509.02479 • Published 6 days ago • 77

OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning

Paper • 2509.01644 • Published 6 days ago • 27

VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use

Paper • 2509.01055 • Published 7 days ago • 61

upvoted a paper 13 days ago

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published 13 days ago • 180

upvoted 3 papers 18 days ago

MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

Paper • 2508.13186 • Published 25 days ago • 17

Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization

Paper • 2508.07629 • Published 28 days ago • 41

Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models

Paper • 2508.10751 • Published 25 days ago • 27

upvoted 2 papers about 1 month ago

R-Zero: Self-Evolving Reasoning LLM from Zero Data

Paper • 2508.05004 • Published Aug 7 • 123

Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24 • 293

upvoted 2 papers about 2 months ago

NeuralOS: Towards Simulating Operating Systems via Neural Generative Models

Paper • 2507.08800 • Published Jul 11 • 79

A Systematic Analysis of Hybrid Linear Attention

Paper • 2507.06457 • Published Jul 8 • 23

upvoted 7 papers 3 months ago

VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation

Paper • 2506.03930 • Published Jun 4 • 26

Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback

Paper • 2506.03106 • Published Jun 3 • 6

Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning

Paper • 2506.04207 • Published Jun 4 • 46

MiMo-VL Technical Report

Paper • 2506.03569 • Published Jun 4 • 79

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Paper • 2505.24864 • Published May 30 • 136

StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs

Paper • 2505.20139 • Published May 26 • 18

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Paper • 2505.09568 • Published May 14 • 97