Files changed (1) hide show
  1. README.md +85 -11
README.md CHANGED
@@ -1,3 +1,5 @@
 
 
1
  # ₩ON: Open LLM for Korean Finance
2
 
3
  ## Introduction
@@ -6,6 +8,61 @@
6
  The core intent behind ₩ON is to promote research openness, benchmark rigorous financial reasoning capabilities, and foster best practices in training Korean-specific financial language models.
7
  The model notably incorporates a two-step structured reasoning approach, providing self-correcting reasoning followed by a conclusive summary, aiming to elevate clarity and accuracy in financial decision-making processes.
8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  ## Model Training
10
  ### Dataset Collection
11
  We compiled a comprehensive training dataset of roughly 400,000 high-quality instructional samples through meticulous processes:
@@ -32,28 +89,45 @@ We designed a sophisticated two-phase training strategy for ₩ON:
32
  - **Solution Step**: After reasoning, the model succinctly summarizes its conclusions within `<solution>` and `</solution>` tags, providing clear and concise answers.
33
 
34
  ## Benchmark Results
35
- We have evaluated ₩ON on comprehensive benchmark employed in the competition.
 
36
  This benchmark consisted of rigorously designed multiple-choice questions (MCQA) and open-ended questions to thoroughly assess the practical and theoretical capabilities of financial language models.
37
- The benchmark categorized into Finance & Accounting (F&A), Financial Market analysis, and an Open-Ended Financial Question-Answering (FinQA) task:
 
 
 
 
38
 
39
- - **Finance & Accounting**: The benchmark targets fo evaluate the model's knowledge and analytical skills in financial concepts, accounting principles, and econometric reasoning.
40
- - **Financial Market Analysis**: It aims to assesses the model's understanding of financial markets, systems, regulations, and domain-specific factual knowledge.
41
- - **Open-Ended FinQA**: It comprises of complex and detailed reasoning questions to simulate realistic financial problem-solving scenarios.
 
 
 
 
 
42
 
43
  **Results**
44
 
45
- | Models | F&A | Market | Open-Ended | Average |
 
 
 
 
46
  |-------------------------------------------|------|--------|------------|---------|
47
  | overfit-brothers/hello_world06 | 0.65 | **0.83** | 0.01 | 0.50 |
48
  | AnonymousLLMer/krx-qwen2.5-v1206-1 | 0.63 | 0.65 | 0.04 | 0.44 |
49
  | shibainu24/qwen2.5-7B-inst-release-1516wk | 0.56 | 0.67 | 0.04 | 0.43 |
50
  | Q-PING/krx_1205_test_model_3 | 0.58 | 0.64 | 0.02 | 0.42 |
51
  | Hi-Q/krx_1206_test_model_2 | 0.60 | 0.61 | 0.02 | 0.41 |
52
- | **₩ON (Ours)** | **0.78** | 0.66 | **0.18** | **0.54** |
53
-
54
- ₩ON emerged as the highest-performing model on average compared to the models awarded in the competition.
55
- The performance shows that the superior ability of ₩ON, particularly in the Finance & Accounting and Open-Ended FinQA subsets, reflecting its strong reasoning capabilities.
56
- Despite less emphasis on purely domain-specific knowledge (Market), ₩ON's reasoning strength notably outperformed competing models.
 
 
 
57
 
58
  ## Quick Start
59
 
 
1
+ **English** | [한국어](https://huggingface.co/KRX-Data/WON-Reasoning/blob/main/KOREAN_README.md)
2
+
3
  # ₩ON: Open LLM for Korean Finance
4
 
5
  ## Introduction
 
8
  The core intent behind ₩ON is to promote research openness, benchmark rigorous financial reasoning capabilities, and foster best practices in training Korean-specific financial language models.
9
  The model notably incorporates a two-step structured reasoning approach, providing self-correcting reasoning followed by a conclusive summary, aiming to elevate clarity and accuracy in financial decision-making processes.
10
 
11
+ ## KRX Financial LLM Competition
12
+
13
+ **Competition Overview**
14
+
15
+ The competition was the first open leaderboard dedicated to evaluating large language models specifically for Korean financial tasks.
16
+ It was conducted over two months, including preliminary and final rounds, attracting 233 registered teams who collectively submitted over 1,100 models.
17
+ The preliminary round included evaluations across five categories (Financial Markets, Finance and Accounting, Domestic Company Analysis, Financial Agent Tasks, and Stock Price Prediction), while the final round concentrated on Finance and Accounting, Financial Markets, and Open-Ended Finance QA.
18
+
19
+ **Benchmark Description**
20
+
21
+ The benchmark used during the competition consisted of approximately 5,500 carefully curated MCQA and Instruction-Response questions across various financial domains:
22
+
23
+ - **Finance and Accounting**: Evaluated via university-level multiple-choice questions on accounting and financial principles.
24
+ - **Financial Markets**: Based on examinations assessing understanding of financial regulations and Korean market systems.
25
+ - **Stock Price Prediction**: Involved binary prediction tasks based on recent stock price data and computed indicators.
26
+ - **Domestic Company Analysis**: Utilized KRX-Bench data generated from Korean company filings.
27
+ - **Financial Agents**: Tasked models with executing financial data manipulations and coding tasks.
28
+ - **Open-Ended FinQA**: Comprised of complex graduate-level econometric and legal reasoning tasks.
29
+
30
+ **Benchmark Competition Statistics**
31
+
32
+ The competition saw broad participation, with 52.5% corporate teams from sectors such as Tech and Finance, and significant academic involvement, reflecting diverse stakeholder interest in Korean financial NLP.
33
+
34
+ <figure style="text-align: center;">
35
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/63e087b6a98d931aa90c1b9c/XTxJM0nXAs18RiJEdwksU.png" width="700" height="900" alt="샘플 이미지" style="display: block; margin: auto;">
36
+ <figcaption style="font-style: italic; color: gray; margin-top: 8px;">
37
+ Distribution of participants. The shades of blue bards indicate corporate participants.
38
+ </figcaption>
39
+ </figure>
40
+
41
+ **Competition Results Analysis**
42
+
43
+ During the preliminary rounds, top-performing models primarily utilized supervised fine-tuning (SFT), yielding notable gains particularly in the Domestic Company Analysis category.
44
+ Despite substantial improvements in this area, advancements in Financial & Accounting and Financial Markets were comparatively modest.
45
+ Most models adopted straightforward SFT approaches; however, some teams experimented with additional training methods, such as continual pre-training (CPT), although its impact at smaller scales remained inconclusive.
46
+
47
+ <figure style="text-align: center;">
48
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/63e087b6a98d931aa90c1b9c/ru3aA2ISwtqS3sJuSPVLN.png" width="600" height="750" alt="샘플 이미지" style="display: block; margin: auto;">
49
+ <figcaption style="font-style: italic; color: gray; margin-top: 8px;">
50
+ Preliminary round performance trends
51
+ </figcaption>
52
+ </figure>
53
+
54
+ In the final rounds, advanced multi-step training methodologies became prevalent.
55
+ Notably, teams implemented curriculum-based SFT strategies, beginning with simpler prompts and progressing towards more challenging instances generated using methods such as Evolve Instruct.
56
+ The best-performing models further refined their capabilities through preference optimization techniques such as Direct Preference Optimization (DPO) and KTO, utilizing responses evaluated by LLM-as-a-Judge methodologies.
57
+ Team Hi-Q specifically demonstrated the effectiveness of continual pre-training combined with SFT and DPO, achieving substantial performance improvements, thereby highlighting the value of structured and multi-stage training processes.
58
+
59
+ <figure style="text-align: center;">
60
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/63e087b6a98d931aa90c1b9c/VV6tYXtJkV5yTzNvMdzP9.png" width="700" height="900" alt="샘플 이미지" style="display: block; margin: auto;">
61
+ <figcaption style="font-style: italic; color: gray; margin-top: 8px;">
62
+ Evaluation results of continual pre-trained models
63
+ </figcaption>
64
+ </figure>
65
+
66
  ## Model Training
67
  ### Dataset Collection
68
  We compiled a comprehensive training dataset of roughly 400,000 high-quality instructional samples through meticulous processes:
 
89
  - **Solution Step**: After reasoning, the model succinctly summarizes its conclusions within `<solution>` and `</solution>` tags, providing clear and concise answers.
90
 
91
  ## Benchmark Results
92
+
93
+ We have evaluated ₩ON on the comprehensive benchmark employed in the competition.
94
  This benchmark consisted of rigorously designed multiple-choice questions (MCQA) and open-ended questions to thoroughly assess the practical and theoretical capabilities of financial language models.
95
+ The benchmark is categorized into Finance & Accounting (F&A), Financial Market analysis, and an Open-Ended Financial Question-Answering (FinQA) task:
96
+
97
+ - **Finance & Accounting**: The benchmark targets to evaluate the model's knowledge and analytical skills in financial concepts, accounting principles, and econometric reasoning.
98
+ - **Financial Market Analysis**: It assesses the model's understanding of financial markets, systems, regulations, and domain-specific factual knowledge.
99
+ - **Open-Ended FinQA**: It comprises complex and detailed reasoning questions to simulate realistic financial problem-solving scenarios.
100
 
101
+ An example of this evaluation dataset is the following:
102
+
103
+ <figure style="text-align: center;">
104
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/63e087b6a98d931aa90c1b9c/7vLKeR6wTbr88UdOeikaE.png" width="700" height="900" alt="샘플 이미지" style="display: block; margin: auto;">
105
+ <figcaption style="font-style: italic; color: gray; margin-top: 8px;">
106
+ Overview of the benchmark used for evaluation. Each example demonstrates a specific question type for each category.
107
+ </figcaption>
108
+ </figure>
109
 
110
  **Results**
111
 
112
+ ₩ON emerged as the highest-performing model on average compared to the models awarded in the competition.
113
+ The performance shows that the superior ability of ₩ON, particularly in the Finance & Accounting and Open-Ended FinQA subsets, reflecting its strong reasoning capabilities.
114
+ Despite less emphasis on purely domain-specific knowledge (Market), ₩ON's reasoning strength notably outperformed competing models.
115
+
116
+ <!-- | Models | F&A | Market | Open-Ended | Average |
117
  |-------------------------------------------|------|--------|------------|---------|
118
  | overfit-brothers/hello_world06 | 0.65 | **0.83** | 0.01 | 0.50 |
119
  | AnonymousLLMer/krx-qwen2.5-v1206-1 | 0.63 | 0.65 | 0.04 | 0.44 |
120
  | shibainu24/qwen2.5-7B-inst-release-1516wk | 0.56 | 0.67 | 0.04 | 0.43 |
121
  | Q-PING/krx_1205_test_model_3 | 0.58 | 0.64 | 0.02 | 0.42 |
122
  | Hi-Q/krx_1206_test_model_2 | 0.60 | 0.61 | 0.02 | 0.41 |
123
+ | **₩ON (Ours)** | **0.78** | 0.66 | **0.18** | **0.54** | -->
124
+
125
+ <figure style="text-align: center;">
126
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/63e087b6a98d931aa90c1b9c/--5Kkwfuf8c6hbNUvpJq6.png" width="700" height="900" alt="샘플 이미지" style="display: block; margin: auto;">
127
+ <!-- <figcaption style="font-style: italic; color: gray; margin-top: 8px;">
128
+ Overview of the benchmark used for evaluation. Each example demonstrates a specific question type for each category.
129
+ </figcaption> -->
130
+ </figure>
131
 
132
  ## Quick Start
133