metadata

library_name: transformers
license: apache-2.0
pipeline_tag: text-generation
base_model:
  - Qwen/Qwen3-4B-Base
tags:
  - konanllm
language:
  - ko
  - en

Konan-LLM-OND

Overview

Konan-LLM-OND, a large language model from Konan Technology Inc., is based on Qwen3-4B-Base. It has been specifically optimized for the Korean language through vocabulary expansion, continual pre-training, and instruction tuning to enhance performance and efficiency.

Languages: Primarily Korean, with support for English.
Key Features:
- Expanded Korean Vocabulary: The model's vocabulary has been expanded with additional Korean tokens to improve tokenization efficiency. As a result, Konan-LLM-OND is approximately 30% more token-efficient with Korean input than Qwen3, leading to greater cost-effectiveness and processing speed.
- Continual Pre-training: The model underwent continual pre-training on a large-scale Korean corpus using an expanded vocabulary. This process enhanced its fundamental understanding and text generation capabilities in Korean.
- Supervised Fine-Tuning (SFT): The model was fine-tuned on a high-quality Korean instruction dataset to improve its ability to understand and execute a wide variety of real-world tasks.

Benchmark Results

Model Performance (＜ 5B)

Model	Model size	Korean			English
Model	Model size	KMMLU	HRM8K	Ko-IFEval	MMLU	GSM8K	IFEval
Konan-LLM-OND	4.0B	54.33%	53.70%	68.42%	70.76%	86.66%	73.38%
EXAONE-3.5-2.4B-Instruct	2.4B	45.22%	38.55%	60.53%	61.76%	78.54%	77.73%
kanana-1.5-2.1b-instruct-2505	2.1B	38.14%	34.14%	55.99%	55.25%	74.83%	64.60%
Midm-2.0-Mini-Instruct	2.3B	43.24%	37.30%	66.81%	55.62%	72.55%	68.30%
Qwen3-4B(w/o reasoning)	4.0B	52.55%	54.16%	68.42%	71.81%	76.57%	80.04%
gemma-3-4b-it	4.3B	40.10%	43.88%	69.15%	61.25%	83.24%	78.28%

Model Performance (≥ 7B)

Model	Model size	Korean			English
Model	Model size	KMMLU	HRM8K	Ko-IFEval	MMLU	GSM8K	IFEval
Konan-LLM-OND	4.0B	54.33%	53.70%	68.42%	70.55%	86.66%	73.38%
A.X-4.0-Light	7.2B	62.48%	51.08%	71.49%	73.15%	86.58%	81.33%
EXAONE-3.5-7.8B-Instruct	7.8B	53.03%	48.02	66.81%	71.43%	89.46%	79.85%
kanana-1.5-8b-instruct-2505	8.0B	47.80%	39.65%	71.05%	65.90%	76.57%	76.80%
Midm-2.0-Base-Instruct	11.5B	58.43%	51.18%	75.00%	71.84%	79.83%	79.67%
Qwen3-8B(w/o reasoning)	8.1B	57.43%	57.88%	70.91%	76.45%	77.79%	82.81%

Note:

The highest scores are shown in bold.

Benchmark Setup

All benchmarks were executed using the following standardized environment.

Evaluation Framework: lm-evaluation-harness v0.4.9
Runtime & Hardware: All models were served with vLLM v0.9.2 on NVIDIA GPU.
Inference Mode: For every benchmark, we invoked the chat_completions API, and scores were computed solely from the generated responses.

Metric Adjustments

KMMLU was evaluated in a zero-shot setting using a CoT-style prompt modified from the kmmlu_direct task in lm-evaluation-harness, with enhanced preprocessing filters applied during evaluation.
MMLU was evaluated in a zero-shot setting using a CoT-style prompt modified from the mmlu_generative task in lm-evaluation-harness, with enhanced preprocessing filters applied during evaluation.
GSM8K was evaluated in a zero-shot setting using the original prompt format from lm-evaluation-harness, with enhanced preprocessing filters applied.
HRM8K was evaluated in a zero-shot setting using the original prompt and data format from lm-evaluation-harness, without any modifications.
Ko-IFEval was evaluated in a zero-shot setting using the original IFEval protocol, with the dataset sourced from allganize/IFEval-Ko.

Evaluation Protocol

Benchmark	Scoring Method	Few-shot
KMMLU	`exact_match`	0-shot CoT
HRM8K	mean of `hrm8k_gsm8k`, `hrm8k_ksm`, `hrm8k_math`, `hrm8k_mmmlu`, `hrm8k_omni_math`	0-shot
Ko-IFEval	mean of `prompt_level_strict_acc`, `inst_level_strict_acc`, `prompt_level_loose_acc`, `inst_level_loose_acc`	0-shot
MMLU	`exact_match`	0-shot CoT
GSM8K	`flexible-extract`	0-shot
IFEval	mean of `prompt_level_strict_acc`, `inst_level_strict_acc`, `prompt_level_loose_acc`, `inst_level_loose_acc`	0-shot

Quickstart

Konan-LLM-OND is supported in transformers v4.52.0 and later.

pip install transformers>=4.52.0

The code example below shows you how to get the model to generate content based on given inputs.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "konantech/Konan-LLM-OND"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model.eval()
tokenizer = AutoTokenizer.from_pretrained(model_name)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "대한민국 수도는?"}
]


input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)

with torch.no_grad():
    output = model.generate(
        input_ids,
        max_new_tokens=64,
        do_sample=False,
    )

len_input_prompt = len(input_ids[0])
response = tokenizer.decode(output[0][len_input_prompt:], skip_special_tokens=True)
print(response)
# 대한민국 수도는 서울입니다.

Citation

@misc{Konan-LLM-OND-2025,
  author = {Konan Technology Inc.},
  title = {Konan-LLM-OND},
  year = {2025},
  url = {https://huggingface.co/konantech/Konan-LLM-OND}
}