Colab Notebook

Open In Colab

ํ•™์Šต ๋ฐ์ดํ„ฐ์…‹

https://huggingface.co/combe4259/difficulty_klue/blob/main/training_data_difficulty_klue.json

๊ธˆ์œต ๋ฌธ์„œ ๋‚œ์ด๋„ ๋ถ„๋ฅ˜ ๋ชจ๋ธ (Text Difficulty Classification)

์ด ๋ชจ๋ธ์€ klue/bert-base๋ฅผ ํŒŒ์ธํŠœ๋‹ํ•˜์—ฌ, ํ•œ๊ตญ์–ด ๊ธˆ์œต ๋ฌธ์žฅ์˜ ๋‚œ์ด๋„๋ฅผ 10๋‹จ๊ณ„(1~10)๋กœ ๋ถ„๋ฅ˜ํ•˜๋Š” Text Classification ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

'์–ด๋ ค์šด ๋ฌธ์žฅ'์ด ๋“ฑ์žฅํ–ˆ๋Š”์ง€ ์‹ค์‹œ๊ฐ„์œผ๋กœ ๊ฐ์ง€ํ•˜์—ฌ '์‰ฌ์šด ๋ฌธ์žฅ ๋ณ€ํ™˜ AI'์˜ ํŠธ๋ฆฌ๊ฑฐ ์—ญํ• ์„ ํ•˜๋„๋ก ์„ค๊ณ„๋˜์—ˆ์Šต๋‹ˆ๋‹ค.


์‚ฌ์šฉ ๋ฐฉ๋ฒ• (How to Use)

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Hugging Face Hub ๋˜๋Š” ์ €์žฅ๋œ ๋กœ์ปฌ ๊ฒฝ๋กœ์—์„œ ๋ชจ๋ธ ๋กœ๋“œ
MODEL_PATH = "combe4259/difficulty_klue"
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_PATH)
model.eval()

# ์ถ”๋ก ํ•  ํ…์ŠคํŠธ
text = "์‹ ์šฉํŒŒ์ƒ๊ฒฐํ•ฉ์ฆ๊ถŒ์˜ CDS ์Šคํ”„๋ ˆ๋“œ ๋ณ€๋™์— ๋”ฐ๋ฅธ ์ˆ˜์ต๊ตฌ์กฐ"

inputs = tokenizer(
    text,
    return_tensors="pt",
    truncation=True,
    max_length=512,
    padding=True
)

# ์˜ˆ์ธก
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits
    
    # ๋ชจ๋ธ์€ 0-9๋กœ ์˜ˆ์ธกํ•˜๋ฏ€๋กœ, +1 ํ•˜์—ฌ 1-10 ์Šค์ผ€์ผ๋กœ ๋ณ€ํ™˜
    prediction = torch.argmax(logits, dim=-1).item()
    difficulty = prediction + 1

print(f"ํ…์ŠคํŠธ: {text}")
print(f"์˜ˆ์ธก ๋‚œ์ด๋„: {difficulty}")
# ์ถœ๋ ฅ: ์˜ˆ์ธก ๋‚œ์ด๋„: 7

ํ•™์Šต ๋ฐ์ดํ„ฐ (Training Data)

  • ์ž์ฒด ๊ตฌ์ถ•ํ•œ 2,880๊ฐœ์˜ ๊ธˆ์œต ๋ฌธ์žฅ/๋‹จ๋ฝ์œผ๋กœ ๊ตฌ์„ฑ๋œ JSON ๋ฐ์ดํ„ฐ ์‚ฌ์šฉ
  • ๋ฐ์ดํ„ฐ ๋ถ„ํ• : Train (2,016) / Validation (432) / Test (432)
  • ๋ฐ์ดํ„ฐ ๋ถˆ๊ท ํ˜•: ๋‚œ์ด๋„ 7(28.2%)๊ณผ 8(18.6%) ์ง‘์ค‘, ๋‚œ์ด๋„ 10(0.0%)์€ 1๊ฐœ ์กด์žฌ
  • ์ „์ฒ˜๋ฆฌ: klue/bert-base ํ† ํฌ๋‚˜์ด์ € ์‚ฌ์šฉ, max_length=512๋กœ ํŒจ๋”ฉ ๋ฐ ์ ˆ๋‹จ

ํ•™์Šต ์ ˆ์ฐจ (Training Procedure)

  • Base Model: klue/bert-base (num_labels=10)
  • Optimizer: AdamW
  • Loss Function: Weighted CrossEntropyLoss (ํด๋ž˜์Šค ๊ฐ€์ค‘์น˜ ์ ์šฉ)
    • ์˜ˆ: ์ƒ˜ํ”Œ 1๊ฐœ์ธ ๋‚œ์ด๋„ 10 โ†’ 10.0
    • ์ƒ˜ํ”Œ 568๊ฐœ์ธ ๋‚œ์ด๋„ 7 โ†’ 0.35
  • Epochs: 10
  • Batch Size: 16
  • Learning Rate: 2e-5 (with 500 warmup steps)
  • Best Model: metric_for_best_model='f1' (F1 ์ ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋†’์€ ์ฒดํฌํฌ์ธํŠธ ์ €์žฅ)
  • Early Stopping: patience=3 (F1 ์ ์ˆ˜๊ฐ€ 3ํšŒ ์—ฐ์† ๊ฐœ์„ ๋˜์ง€ ์•Š์œผ๋ฉด ํ•™์Šต ์กฐ๊ธฐ ์ข…๋ฃŒ)

ํ‰๊ฐ€ ๊ฒฐ๊ณผ (Evaluation Results)

Test Set (432๊ฐœ) ๊ธฐ์ค€ ์ตœ์ข… ์„ฑ๋Šฅ์ž…๋‹ˆ๋‹ค.
'F1 Score'์™€ 'MAE, Within 1 Acc' ๋ชจ๋‘์—์„œ ์•ˆ์ •์ ์ธ ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.

Metric Score ์„ค๋ช…
F1 Score (Weighted) 0.607 (ํ•ต์‹ฌ ์ง€ํ‘œ) ๋ชจ๋ธ์˜ ์ „๋ฐ˜์ ์ธ ์ •๋ฐ€๋„/์žฌํ˜„์œจ
Accuracy (์ •ํ™•๋„) 0.604 10๊ฐœ ์ค‘ ์ •ํ™•ํžˆ ๋งžํž ํ™•๋ฅ 
MAE (ํ‰๊ท  ์ ˆ๋Œ€ ์˜ค์ฐจ) 0.560 (์ค‘์š”) ์˜ˆ์ธก์ด ์ •๋‹ต์—์„œ ํ‰๊ท  0.56์นธ ๋ฒ—์–ด๋‚จ
Within 1 Acc 0.926 (์ค‘์š”) ยฑ1 ์˜ค์ฐจ ๋ฒ”์œ„ ๋‚ด ์ •ํ™•๋„ (92.6%)

์ƒ˜ํ”Œ ์˜ˆ์ธก

์ž…๋ ฅ ํ…์ŠคํŠธ ์˜ˆ์ธก ๋‚œ์ด๋„ (1-10)
"์€ํ–‰์— ๋ˆ์„ ๋งก๊ฒจ์š”" 1
"์˜ˆ๊ธˆ์ž๋ณดํ˜ธ๋ฒ•์— ๋”ฐ๋ผ 5์ฒœ๋งŒ์›๊นŒ์ง€ ๋ณดํ˜ธ๋ฉ๋‹ˆ๋‹ค" 2
"์‹ ์šฉํŒŒ์ƒ๊ฒฐํ•ฉ์ฆ๊ถŒ์˜ CDS ์Šคํ”„๋ ˆ๋“œ ๋ณ€๋™์— ๋”ฐ๋ฅธ ์ˆ˜์ต๊ตฌ์กฐ" 7
Downloads last month
132
Safetensors
Model size
111M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for combe4259/difficulty_klue

Base model

klue/bert-base
Finetuned
(129)
this model