Kunal Suri

suryakiran786

suri-kunal

AI & ML interests

None yet

Recent Activity

upvoted a collection 6 days ago

Reward Bench 2

upvoted a paper 18 days ago

BizFinBench: A Business-Driven Real-World Financial Benchmark for Evaluating LLMs

upvoted an article 29 days ago

TinyAgents: A Minimal Experiment with Code Agents and MCP Tools

View all activity

Organizations

suryakiran786's activity

upvoted a collection 6 days ago

Reward Bench 2

Collection

Datasets, spaces, and models for Reward Bench 2 benchmark and paper! • 11 items • Updated 12 days ago • 11

upvoted a paper 18 days ago

BizFinBench: A Business-Driven Real-World Financial Benchmark for Evaluating LLMs

Paper • 2505.19457 • Published 20 days ago • 61

upvoted an article 29 days ago

Article

TinyAgents: A Minimal Experiment with Code Agents and MCP Tools

•

30 days ago

• 29

upvoted 4 papers 3 months ago

PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving

Paper • 2502.16111 • Published Feb 22 • 9

upvoted a paper 4 months ago

Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning

Paper • 2502.14768 • Published Feb 20 • 48

upvoted 2 articles 4 months ago

Article

NPHardEval Leaderboard: Unveiling the Reasoning Abilities of Large Language Models through Complexity Classes and Dynamic Updates

and 3 others •

Feb 2, 2024

• 4

Article

Agent Leaderboard: Evaluating AI Agents in Multi-Domain Scenarios

and 1 other •

Feb 12

• 22

upvoted 4 papers 4 months ago

Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling

Paper • 2502.06703 • Published Feb 10 • 153

Efficient Tool Use with Chain-of-Abstraction Reasoning

Paper • 2401.17464 • Published Jan 30, 2024 • 21

Training Language Model Agents without Modifying Language Models

Paper • 2402.11359 • Published Feb 17, 2024 • 2

Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments

Paper • 2402.14672 • Published Feb 22, 2024 • 1

upvoted an article 4 months ago

Article

π0 and π0-FAST: Vision-Language-Action Models for General Robot Control

and 3 others •

Feb 4

• 162

upvoted 4 papers 5 months ago

The Lessons of Developing Process Reward Models in Mathematical Reasoning

Paper • 2501.07301 • Published Jan 13 • 99

LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs

Paper • 2501.06186 • Published Jan 10 • 66

DynaSaur: Large Language Agents Beyond Predefined Actions

Paper • 2411.01747 • Published Nov 4, 2024 • 36

Executable Code Actions Elicit Better LLM Agents

Paper • 2402.01030 • Published Feb 1, 2024 • 150