KoReason

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

amitbcp authored a paper about 1 month ago

BenchHub: A Unified Benchmark Suite for Holistic and Customizable LLM Evaluation

amitbcp authored a paper about 2 months ago

Survey of Large Multimodal Model Datasets, Application Categories and Taxonomy

amitbcp authored a paper about 2 months ago

Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia

View all activity

amitbcp

authored a paper about 1 month ago

BenchHub: A Unified Benchmark Suite for Holistic and Customizable LLM Evaluation

Paper • 2506.00482 • Published May 31 • 8

amitbcp

authored 3 papers about 2 months ago

Survey of Large Multimodal Model Datasets, Application Categories and Taxonomy

Paper • 2412.17759 • Published Dec 23, 2024

Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia

Paper • 2503.07920 • Published Mar 10 • 100

SweEval: Do LLMs Really Swear? A Safety Benchmark for Testing Limits for Enterprise Use

Paper • 2505.17332 • Published May 22 • 31

amphora

authored a paper about 2 months ago

When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific Research

Paper • 2505.11855 • Published May 17 • 10

Cartinoe5930

authored a paper about 2 months ago

When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific Research

Paper • 2505.11855 • Published May 17 • 10

Cartinoe5930

authored a paper 2 months ago

Won: Establishing Best Practices for Korean Financial NLP

Paper • 2503.17963 • Published Mar 23

amphora

authored a paper 5 months ago

Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning

Paper • 2502.17407 • Published Feb 24 • 26

Cartinoe5930

authored 2 papers 5 months ago

Multi-Step Reasoning in Korean and the Emergent Mirage

Paper • 2501.05712 • Published Jan 10

Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning

Paper • 2502.17407 • Published Feb 24 • 26

amitbcp

authored a paper 6 months ago

MVTamperBench: Evaluating Robustness of Vision-Language Models

Paper • 2412.19794 • Published Dec 27, 2024 • 3

Cartinoe5930

authored 2 papers 6 months ago

LLM-as-a-Judge & Reward Model: What They Can and Cannot Do

Paper • 2409.11239 • Published Sep 17, 2024 • 2

Understand, Solve and Translate: Bridging the Multilingual Mathematical Reasoning Gap

Paper • 2501.02448 • Published Jan 5

amphora

authored a paper over 1 year ago

KMMLU: Measuring Massive Multitask Language Understanding in Korean

Paper • 2402.11548 • Published Feb 18, 2024

amphora

authored a paper almost 2 years ago

HAE-RAE Bench: Evaluation of Korean Knowledge in Language Models

Paper • 2309.02706 • Published Sep 6, 2023 • 2

amphora

authored a paper about 2 years ago

Removing Non-Stationary Knowledge From Pre-Trained Language Models for Entity-Level Sentiment Classification in Finance

Paper • 2301.03136 • Published Jan 9, 2023

AI & ML interests

Recent Activity

Team members 5

KoReason's activity