lioushz

Shz

·

AI & ML interests

None yet

Organizations

upvoted a paper 3 months ago

Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

Paper • 2603.25040 • Published Mar 26 • 134

upvoted 4 papers 7 months ago

MoCapAnything: Unified 3D Motion Capture for Arbitrary Skeletons from Monocular Videos

Paper • 2512.10881 • Published Dec 11, 2025 • 32

OPV: Outcome-based Process Verifier for Efficient Long Chain-of-Thought Verification

Paper • 2512.10756 • Published Dec 11, 2025 • 35

Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving

Paper • 2512.10739 • Published Dec 11, 2025 • 47

ATLAS: A High-Difficulty, Multidisciplinary Benchmark for Frontier Scientific Reasoning

Paper • 2511.14366 • Published Nov 18, 2025 • 17

upvoted a paper 11 months ago

CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward

Paper • 2508.03686 • Published Aug 5, 2025 • 39

upvoted 2 papers 12 months ago

CompassJudger-2: Towards Generalist Judge Model via Verifiable Rewards

Paper • 2507.09104 • Published Jul 12, 2025 • 18

Rethinking Verification for LLM Code Generation: From Generation to Testing

Paper • 2507.06920 • Published Jul 9, 2025 • 29

upvoted a collection 12 months ago

CompassVerifier

CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward • 5 items • Updated Aug 31, 2025 • 7

upvoted a paper 12 months ago

Coding Triangle: How Does Large Language Model Understand Code?

Paper • 2507.06138 • Published Jul 8, 2025 • 23

upvoted a paper about 1 year ago

Deciphering Trajectory-Aided LLM Reasoning: An Optimization Perspective

Paper • 2505.19815 • Published May 26, 2025 • 36

upvoted 4 papers over 1 year ago

Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM

Paper • 2503.14478 • Published Mar 18, 2025 • 48

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

Paper • 2502.18411 • Published Feb 25, 2025 • 74

Are Your LLMs Capable of Stable Reasoning?

Paper • 2412.13147 • Published Dec 17, 2024 • 93

CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution

Paper • 2410.16256 • Published Oct 21, 2024 • 61

upvoted 2 papers almost 2 years ago

HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models

Paper • 2409.16191 • Published Sep 24, 2024 • 41

NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?

Paper • 2407.11963 • Published Jul 16, 2024 • 44

upvoted 2 papers about 2 years ago

MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding

Paper • 2406.14515 • Published Jun 20, 2024 • 33

Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs

Paper • 2406.14544 • Published Jun 20, 2024 • 35