library_name: transformers
license: apache-2.0
pipeline_tag: text-generation
base_model:
- Qwen/Qwen3-4B-Base
tags:
- konanllm
language:
- ko
- en
Konan-LLM-OND
Overview
Konan-LLM-OND, a large language model from Konan Technology Inc., is based on Qwen3-4B-Base. It has been specifically optimized for the Korean language through vocabulary expansion, continual pre-training, and instruction tuning to enhance performance and efficiency.
- Languages: Primarily Korean, with support for English.
- Key Features:
- Expanded Korean Vocabulary: The model's vocabulary has been expanded with additional Korean tokens to improve tokenization efficiency. As a result, Konan-LLM-OND is approximately 30% more token-efficient with Korean input than Qwen3, leading to greater cost-effectiveness and processing speed.
- Continual Pre-training: The model underwent continual pre-training on a large-scale Korean corpus using an expanded vocabulary. This process enhanced its fundamental understanding and text generation capabilities in Korean.
- Supervised Fine-Tuning (SFT): The model was fine-tuned on a high-quality Korean instruction dataset to improve its ability to understand and execute a wide variety of real-world tasks.
Benchmark Results
Model Performance (๏ผ 5B)
Model | Model size | Korean | English | ||||
---|---|---|---|---|---|---|---|
KMMLU | HRM8K | Ko-IFEval | MMLU | GSM8K | IFEval | ||
Konan-LLM-OND | 4.0B | 54.33% | 53.70% | 68.42% | 70.76% | 86.66% | 73.38% |
EXAONE-3.5-2.4B-Instruct | 2.4B | 45.22% | 38.55% | 60.53% | 61.76% | 78.54% | 77.73% |
kanana-1.5-2.1b-instruct-2505 | 2.1B | 38.14% | 34.14% | 55.99% | 55.25% | 74.83% | 64.60% |
Midm-2.0-Mini-Instruct | 2.3B | 43.24% | 37.30% | 66.81% | 55.62% | 72.55% | 68.30% |
Qwen3-4B(w/o reasoning) | 4.0B | 52.55% | 54.16% | 68.42% | 71.81% | 76.57% | 80.04% |
gemma-3-4b-it | 4.3B | 40.10% | 43.88% | 69.15% | 61.25% | 83.24% | 78.28% |
Model Performance (โฅ 7B)
Model | Model size | Korean | English | ||||
---|---|---|---|---|---|---|---|
KMMLU | HRM8K | Ko-IFEval | MMLU | GSM8K | IFEval | ||
Konan-LLM-OND | 4.0B | 54.33% | 53.70% | 68.42% | 70.55% | 86.66% | 73.38% |
A.X-4.0-Light | 7.2B | 62.48% | 51.08% | 71.49% | 73.15% | 86.58% | 81.33% |
EXAONE-3.5-7.8B-Instruct | 7.8B | 53.03% | 48.02 | 66.81% | 71.43% | 89.46% | 79.85% |
kanana-1.5-8b-instruct-2505 | 8.0B | 47.80% | 39.65% | 71.05% | 65.90% | 76.57% | 76.80% |
Midm-2.0-Base-Instruct | 11.5B | 58.43% | 51.18% | 75.00% | 71.84% | 79.83% | 79.67% |
Qwen3-8B(w/o reasoning) | 8.1B | 57.43% | 57.88% | 70.91% | 76.45% | 77.79% | 82.81% |
Note:
- The highest scores are shown in bold.
Benchmark Setup
All benchmarks were executed using the following standardized environment.
- Evaluation Framework:
lm-evaluation-harness v0.4.9
- Runtime & Hardware: All models were served with
vLLM v0.9.2
on NVIDIA GPU. - Inference Mode: For every benchmark, we invoked the
chat_completions
API, and scores were computed solely from the generated responses.
Metric Adjustments
KMMLU was evaluated in a zero-shot setting using a CoT-style prompt modified from the
kmmlu_direct
task in lm-evaluation-harness, with enhanced preprocessing filters applied during evaluation.MMLU was evaluated in a zero-shot setting using a CoT-style prompt modified from the
mmlu_generative
task in lm-evaluation-harness, with enhanced preprocessing filters applied during evaluation.GSM8K was evaluated in a zero-shot setting using the original prompt format from lm-evaluation-harness, with enhanced preprocessing filters applied.
HRM8K was evaluated in a zero-shot setting using the original prompt and data format from lm-evaluation-harness, without any modifications.
Ko-IFEval was evaluated in a zero-shot setting using the original IFEval protocol, with the dataset sourced from allganize/IFEval-Ko.
Evaluation Protocol
Benchmark | Scoring Method | Few-shot |
---|---|---|
KMMLU | exact_match |
0-shot CoT |
HRM8K | mean of hrm8k_gsm8k , hrm8k_ksm , hrm8k_math , hrm8k_mmmlu , hrm8k_omni_math |
0-shot |
Ko-IFEval | mean of prompt_level_strict_acc , inst_level_strict_acc , prompt_level_loose_acc , inst_level_loose_acc |
0-shot |
MMLU | exact_match |
0-shot CoT |
GSM8K | flexible-extract |
0-shot |
IFEval | mean of prompt_level_strict_acc , inst_level_strict_acc , prompt_level_loose_acc , inst_level_loose_acc |
0-shot |
Quickstart
Konan-LLM-OND is supported in transformers v4.52.0
and later.
pip install transformers>=4.52.0
The code example below shows you how to get the model to generate content based on given inputs.
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "konantech/Konan-LLM-OND"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto",
)
model.eval()
tokenizer = AutoTokenizer.from_pretrained(model_name)
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "๋ํ๋ฏผ๊ตญ ์๋๋?"}
]
input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(
input_ids,
max_new_tokens=64,
do_sample=False,
)
len_input_prompt = len(input_ids[0])
response = tokenizer.decode(output[0][len_input_prompt:], skip_special_tokens=True)
print(response)
# ๋ํ๋ฏผ๊ตญ ์๋๋ ์์ธ์
๋๋ค.
Citation
@misc{Konan-LLM-OND-2025,
author = {Konan Technology Inc.},
title = {Konan-LLM-OND},
year = {2025},
url = {https://huggingface.co/konantech/Konan-LLM-OND}
}