4 34 5

Kanzhi Cheng

cckevinn

AI & ML interests

None yet

Recent Activity

upvoted a paper 2 days ago

OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions

upvoted a paper 6 days ago

TIDE: Trajectory-based Diagnostic Evaluation of Test-Time Improvement in LLM Agents

upvoted a paper 29 days ago

OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent

View all activity

Organizations

upvoted a paper 2 days ago

OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions

Paper • 2602.05843 • Published 6 days ago • 54

upvoted a paper 6 days ago

TIDE: Trajectory-based Diagnostic Evaluation of Test-Time Improvement in LLM Agents

Paper • 2602.02196 • Published 9 days ago • 32

upvoted a paper 29 days ago

OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent

Paper • 2601.07779 • Published 30 days ago • 28

upvoted a paper about 1 month ago

SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning

Paper • 2512.24330 • Published Dec 30, 2025 • 35

upvoted a paper 2 months ago

PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling

Paper • 2512.04784 • Published Dec 2, 2025 • 25

upvoted 2 papers 3 months ago

OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows

Paper • 2510.24411 • Published Oct 28, 2025 • 72

JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence

Paper • 2510.23538 • Published Oct 27, 2025 • 97

upvoted 2 papers 4 months ago

VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications

Paper • 2509.26490 • Published Sep 30, 2025 • 20

The Era of Real-World Human Interaction: RL from User Conversations

Paper • 2509.25137 • Published Sep 29, 2025 • 19

upvoted a paper 5 months ago

ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data

Paper • 2509.15221 • Published Sep 18, 2025 • 111

upvoted a collection 5 months ago

ScaleCUA

Collection

7 items • Updated Nov 12, 2025 • 17

upvoted 2 papers 6 months ago

DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization

Paper • 2508.14460 • Published Aug 20, 2025 • 85

CodeEvo: Interaction-Driven Synthesis of Code-centric Data through Hybrid and Iterative Feedback

Paper • 2507.22080 • Published Jul 25, 2025 • 9

upvoted a paper 7 months ago

MUR: Momentum Uncertainty guided Reasoning for Large Language Models

Paper • 2507.14958 • Published Jul 20, 2025 • 47

upvoted 3 papers 8 months ago

From Ideal to Real: Unified and Data-Efficient Dense Prediction for Real-World Scenarios

Paper • 2506.20279 • Published Jun 25, 2025 • 20

A Controllable Examination for Long-Context Language Models

Paper • 2506.02921 • Published Jun 3, 2025 • 33

GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents

Paper • 2506.03143 • Published Jun 3, 2025 • 53

upvoted a paper 9 months ago

ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows

Paper • 2505.19897 • Published May 26, 2025 • 104

upvoted 2 papers 10 months ago

Could Thinking Multilingually Empower LLM Reasoning?

Paper • 2504.11833 • Published Apr 16, 2025 • 29

Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning

Paper • 2504.08672 • Published Apr 11, 2025 • 55

Kanzhi Cheng

AI & ML interests

Recent Activity

Organizations

cckevinn's activity