Hunminai-1.0-27b

Hunminai-1.0 is a Korean-aligned language model based on Google's Gemma-3 architecture. To improve performance on Korean natural language tasks, the model was fine-tuned on a corpus of 100k instruction examples using Supervised Fine-Tuning (SFT) followed by Direct Preference Optimization (DPO). This approach enables the model to better align with user intents in Korean and enhances its applicability to downstream tasks such as dialogue generation, question answering, and long-form text generation.

Model Details

  • Base Model: google/gemma-3-27b-it
  • Base Model Release Date: March 12, 2025
  • Context Length: 128k
  • License: gemma
  • Model Type: Text Generation
  • Fine-Tuning Techniques: Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO)

Usage

Gemma 3 is supported starting from version 4.50.0 of the Transformers library.

To update to the latest version, run the following command:

$ pip install -U transformers

Install the required package and run the example code below to load the Hunminai-3-27B model and perform a simple Korean-language chat completion.

# pip install accelerate

from transformers import AutoProcessor, Gemma3ForConditionalGeneration
import torch

model_id = "davidkim205/Hunminai-1.0-27b"

model = Gemma3ForConditionalGeneration.from_pretrained(
    model_id, device_map="auto"
).eval()

processor = AutoProcessor.from_pretrained(model_id)

messages = [
    {
        "role": "system",
        "content": [{"type": "text", "text": "당신은 μœ μš©ν•œ AI λΉ„μ„œμž…λ‹ˆλ‹€."}]
    },
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "λŒ€ν•œλ―Όκ΅­μ˜ μˆ˜λ„λŠ” μ–΄λ””μΈκ°€μš”?"}
        ]
    }
]

inputs = processor.apply_chat_template(
    messages, add_generation_prompt=True, tokenize=True,
    return_dict=True, return_tensors="pt"
).to(model.device, dtype=torch.bfloat16)

input_len = inputs["input_ids"].shape[-1]

with torch.inference_mode():
    generation = model.generate(**inputs, max_new_tokens=128, do_sample=False)
    generation = generation[0][input_len:]

decoded = processor.decode(generation, skip_special_tokens=True)
print(decoded)

Training Dataset

The model was trained on approximately 100k high-quality Korean instruction examples. The dataset was curated to cover a wide range of Korean language contexts and tasks, with a focus on aligning model outputs with user intent and natural language generation, and is currently undisclosed.

Evaluation

Benchmarks Datasets

The table below contains a description of the Korean LLM evaluation benchmark dataset used for the model evaluation. More information on the benchmarks is available at Blog.

Benchmark Description Abbreviation
ko-bench Korean-translated dataset of MT-Bench questions bench
ko-ged Korean GED (elementary, middle, high school) open-ended question dataset
Subjects: Korean, English, Mathematics, Science, Social Studies
ged
ko-ged-mc-elementary Korean elementary school GED multiple-choice question dataset ged:E
ko-ged-mc-middle Korean middle school GED multiple-choice question dataset ged:M
ko-ged-mc-high Korean high school GED multiple-choice question dataset ged:H
ko-gpqa Korean version of GPQA containing challenging physics questions designed to test deep understanding and logical reasoning gpqa
ko-math-500 Korean-translated subset of 500 high school-level math problems from the MATH dataset, including detailed solutions with LaTeX notation math500
ko-ifeval Instruction-following evaluation dataset translated from IFEval, adapted for Korean language and culture ifeval

Benchmark Results

davidkim205
Hunminai
-1.0-27b
google
gemma-3
-27b-it
unsloth
gemma-3
-27b-it
google
gemma-2
-27b-it
Avg. 8.53 8.31 8.03 7.49
bench 8.26 8.06 8.27 7.59
ged 9.19 9.02 9.03 8.38
ged:E 9.86 9.86 9.93 9.51
ged:M 9.67 9.63 9.76 9.10
ged:H 9.60 9.52 9.52 9.32
gpqa 4.55 3.69 3.38 3.54
math500 8.56 8.38 6.26 5.00
ifeval 8.10
Downloads last month
131
Safetensors
Model size
27.4B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including davidkim205/Hunminai-1.0-27b