suryakiran786 (Kunal Suri)

upvoted a collection 6 days ago

Reward Bench 2

Collection

Datasets, spaces, and models for Reward Bench 2 benchmark and paper! • 11 items • Updated 12 days ago • 11

upvoted a paper 18 days ago

BizFinBench: A Business-Driven Real-World Financial Benchmark for Evaluating LLMs

Paper • 2505.19457 • Published 20 days ago • 61

upvoted an article 29 days ago

Article

TinyAgents: A Minimal Experiment with Code Agents and MCP Tools

By

•

30 days ago

• 29

reacted to Kseniase's post with 👀 3 months ago

Post

2065

9 Multimodal Chain-of-Thought methods

How Chain-of-Thought (CoT) prompting can unlock models' full potential across images, video, audio and more? Finding special multimodal CoT techniques is the answer.

Here are 9 methods of Multimodal Chain-of-Thought (MCoT). Most of them are open-source:

1. KAM-CoT -> KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning (2401.12863)
This lightweight framework combines CoT prompting with knowledge graphs (KGs) and achieves 93.87% accuracy

2. Multimodal Visualization-of-Thought (MVoT) -> Imagine while Reasoning in Space: Multimodal Visualization-of-Thought (2501.07542)
Lets models generate visual reasoning traces, using a token discrepancy loss to improve visual quality

3. Compositional CoT (CCoT) -> Compositional Chain-of-Thought Prompting for Large Multimodal Models (2311.17076)
Uses scene graph (SG) representations generated by the LMM itself to improve performance on compositional and general multimodal benchmarks

4. URSA -> URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics (2501.04686)
Brings System 2-style thinking to multimodal math reasoning, using a 3-module CoT data synthesis process with CoT distillation, trajectory-format rewriting and format unification

5. MM-Verify -> MM-Verify: Enhancing Multimodal Reasoning with Chain-of-Thought Verification (2502.13383)
Introduces a verification mechanism with MM-Verifier and MM-Reasoner that implements synthesized high-quality CoT data for multimodal reasoning

6. Duty-Distinct CoT (DDCoT) -> DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models (2310.16436)
Divides the reasoning responsibilities between LMs and visual models, integrating the visual recognition capabilities into the joint reasoning process

7. Multimodal-CoT from Amazon Web Services -> Multimodal Chain-of-Thought Reasoning in Language Models (2302.00923)
A two-stage framework separates rationale generation from answer prediction, allowing the model to reason more effectively using multimodal inputs

8. Graph-of-Thought (GoT) -> Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Large Language Models (2305.16582)
This two-stage framework models reasoning as a graph of interconnected ideas, improving performance on text-only and multimodal tasks

More in the comments👇

1 reply

·

upvoted 4 papers 3 months ago

PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving

Paper • 2502.16111 • Published Feb 22 • 9

liked a Space 4 months ago

208

MMLU-Pro Leaderboard

🥇

More advanced and challenging multi-task evaluation

upvoted a paper 4 months ago

Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning

Paper • 2502.14768 • Published Feb 20 • 48

liked a Space 4 months ago

568

Scaling test-time compute

📈

Enhance math problem solving by scaling test-time compute

upvoted an article 4 months ago

Article

NPHardEval Leaderboard: Unveiling the Reasoning Abilities of Large Language Models through Complexity Classes and Dynamic Updates

By

and 3 others •

Feb 2, 2024

• 4

liked a dataset 4 months ago

galileo-ai/agent-leaderboard

Viewer • Updated Feb 11 • 1.28k • 212 • 27

upvoted an article 4 months ago

Article

Agent Leaderboard: Evaluating AI Agents in Multi-Domain Scenarios

By

and 1 other •

Feb 12

• 22

upvoted 4 papers 4 months ago

Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling

Paper • 2502.06703 • Published Feb 10 • 153

Efficient Tool Use with Chain-of-Abstraction Reasoning

Paper • 2401.17464 • Published Jan 30, 2024 • 21

Training Language Model Agents without Modifying Language Models

Paper • 2402.11359 • Published Feb 17, 2024 • 2

Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments

Paper • 2402.14672 • Published Feb 22, 2024 • 1

reacted to Kseniase's post with 🔥 4 months ago

Post

7929

8 New Types of RAG

RAG techniques continuously evolve to enhance LLM response accuracy by retrieving relevant external data during generation. To keep up with current AI trends, new RAG types incorporate deep step-by-step reasoning, tree search, citations, multimodality and other effective techniques.

Here's a list of 8 latest RAG advancements:

1. DeepRAG -> DeepRAG: Thinking to Retrieval Step by Step for Large Language Models (2502.01142)
Models retrieval-augmented reasoning as a Markov Decision Process, enabling strategic retrieval. It dynamically decides when to retrieve external knowledge and when rely on parametric reasoning.

2. RealRAG -> RealRAG: Retrieval-augmented Realistic Image Generation via Self-reflective Contrastive Learning (2502.00848)
Enhances novel object generation by retrieving real-world images and using self-reflective contrastive learning to fill knowledge gap, improve realism and reduce distortions.

3. Chain-of-Retrieval Augmented Generation (CoRAG) -> Chain-of-Retrieval Augmented Generation (2501.14342)
Retrieves information step-by-step and adjusts it, also deciding how much compute power to use at test time. If needed it reformulates queries.

4. VideoRAG -> VideoRAG: Retrieval-Augmented Generation over Video Corpus (2501.05874)
Enables unlimited-length video processing, using dual-channel architecture that integrates graph-based textual grounding and multi-modal context encoding.

5. CFT-RAG -> CFT-RAG: An Entity Tree Based Retrieval Augmented Generation Algorithm With Cuckoo Filter (2501.15098)
A tree-RAG acceleration method uses an improved Cuckoo Filter to optimize entity localization, enabling faster retrieval.

6. Contextualized Graph RAG (CG-RAG) -> CG-RAG: Research Question Answering by Citation Graph Retrieval-Augmented LLMs (2501.15067)
Uses Lexical-Semantic Graph Retrieval (LeSeGR) to integrate sparse and dense signals within graph structure and capture citation relationships

7. GFM-RAG -> GFM-RAG: Graph Foundation Model for Retrieval Augmented Generation (2502.01113)
A graph foundation model that uses a graph neural network to refine query-knowledge connections

8. URAG -> URAG: Implementing a Unified Hybrid RAG for Precise Answers in University Admission Chatbots -- A Case Study at HCMUT (2501.16276)
A hybrid system combining rule-based and RAG methods to improve lightweight LLMs for educational chatbots

1 reply

·

liked a dataset 4 months ago

m-ric/agents_small_benchmark

Viewer • Updated Jan 19, 2024 • 100 • 147 • 11

Kunal Suri

AI & ML interests

Recent Activity

Organizations

suryakiran786's activity

Reward Bench 2

BizFinBench: A Business-Driven Real-World Financial Benchmark for Evaluating LLMs

TinyAgents: A Minimal Experiment with Code Agents and MCP Tools

SPIN-Bench: How Well Do LLMs Plan Strategically and Reason Socially?

API Agents vs. GUI Agents: Divergence and Convergence

TxAgent: An AI Agent for Therapeutic Reasoning Across a Universe of Tools

PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving

MMLU-Pro Leaderboard

Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning

Scaling test-time compute

NPHardEval Leaderboard: Unveiling the Reasoning Abilities of Large Language Models through Complexity Classes and Dynamic Updates

galileo-ai/agent-leaderboard

Agent Leaderboard: Evaluating AI Agents in Multi-Domain Scenarios

Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling

Efficient Tool Use with Chain-of-Abstraction Reasoning

Training Language Model Agents without Modifying Language Models

Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments

m-ric/agents_small_benchmark