Qwen3-8B-Korean-Highschool-English-Exam

📚 개요

Qwen3-8B를 기반으로 한 한국 고등학교 영어 내신 문제 생성 모델입니다. 영어 지문을 입력받아 수능 및 내신 수준의 다양한 문제 유형을 자동으로 생성합니다.

🎯 주요 특징

고등학교 2학년 수준 영어 내신 문제 생성에 특화
6가지 핵심 문제 유형 지원
약 400개 고품질 데이터셋으로 Fine-tuning
LoRA 기반 경량화 파인튜닝으로 학습
실제 수능/내신 출제 패턴 반영

📝 지원 문제 유형

문제 유형	설명
제목 추론	지문의 가장 적절한 제목을 찾는 문제
주제 추론	지문의 핵심 주제를 파악하는 문제
내용 불일치	지문 내용과 일치하지 않는 선택지를 찾는 문제
빈칸 추론	문맥상 빈칸에 들어갈 적절한 표현을 찾는 문제
어법 오류	문법적으로 잘못된 부분을 찾는 문제

🚀 빠른 시작

설치

pip install transformers peft torch

기본 사용법

from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer
import torch

# 모델 로드
model = AutoPeftModelForCausalLM.from_pretrained(
    "huggingface-KREW/Qwen3-8B-Korean-Highschool-English-Exam",
    device_map="auto",
    torch_dtype=torch.bfloat16
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B")

# 영어 지문 예시
passage = """
If the brain has already stored someone's face and name, why do we still end up 
remembering one and not the other? This is because the brain has a two-tier memory 
system at work when it comes to retrieving memories, giving rise to a common yet 
infuriating sensation: recognising someone but not being able to remember how or why, 
or what their name is.
"""

# 문제 생성 함수
def generate_question(passage, question_type):
    messages = [
        {
            "role": "user",
            "content": f"다음 영어 지문을 {question_type} 문제로 만들어주세요.\n\n지문:\n{passage}\n\n"
        }
    ]
    
    prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    inputs = tokenizer(prompt, return_tensors="pt")
    outputs = model.generate(
        inputs["input_ids"].to("cuda"), 
        max_new_tokens=1024,
        temperature=0.7,
        do_sample=True
    )
    
    return tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]

# 제목 추론 문제 생성
result = generate_question(passage, "제목 추론")
print(result)

배치 문제 생성

# 모든 문제 유형에 대해 문제 생성
question_types = ["제목 추론", "주제 추론", "내용 불일치", "빈칸 추론", "어법 오류", "요지 추론"]

for q_type in question_types:
    print(f"\n{'='*50}")
    print(f"{q_type} 문제")
    print('='*50)
    result = generate_question(passage, q_type)
    print(result)

📊 데이터셋 구성

RAW 데이터 형식

{
  "passage": "The Great Fire of London occurred in September 1666...",
  "passage_length": 320,
  "questions": [
    {
      "year": 2024,
      "type": "main_idea",
      "grade_level": "HighSchool 2nd Grade",
      "difficulty": "easy",
      "question": "What is the main idea of the passage?",
      "options": [
        "London has always been...",
        "The Great Fire caused...",
        "St. Paul's Cathedral was...",
        "Wooden buildings were..."
      ],
      "answer": "The Great Fire caused....",
      "cognitive_skill": "comprehension"
    }
  ]
}

Fine-tuning 데이터 형식

{
  "instruction": "Generate a multiple-choice question for grade: HighSchool 2nd Grade, level: Medium, question type: Main Idea.",
  "input": "Passage: 'The Great Fire of London occurred in September 1666...",
  "output": "Question: 'What is the main idea of the passage?'\nA) The Great Fire ...\nB) The fire started ...\nC) The fire spread ...\nD) The Great Fire of London had ...\nAnswer: C"
}

🔧 모델 상세 정보

기본 모델

Base Model: Qwen3-8B
Fine-tuning Method: LoRA (Low-Rank Adaptation)
Training Data: 500+ 고등학교 영어 내신 문제
Language: Korean (Question) + English (Passage)

학습 설정

Learning Rate: 1e-4
Batch Size: 1
LoRA Rank: 16
LoRA Alpha: 64
Training Epochs: 3

평가 방법

자동 평가: 문법, 단어 난이도 등 텍스트 품질 자동 평가
전문가 평가: 교육 전문가의 문제 품질 및 적절성 평가

📊 성능 지표는 현재 평가 진행 중입니다

💡 사용 예시

1. 제목 추론 문제 생성

passage = "Climate change is one of the most pressing issues..."
question = generate_question(passage, "제목 추론")

출력 예시:

다음 글의 제목으로 가장 적절한 것은?

① The History of Climate Research
② Climate Change: An Urgent Global Challenge  
③ Weather Patterns Around the World
④ Scientific Methods in Environmental Studies
⑤ The Future of Renewable Energy

정답: ②

2. 빈칸 추론 문제 생성

question = generate_question(passage, "빈칸 추론")

출력 예시:

다음 글의 빈 칸에 들어갈 말로 가장 적절한 것은?

Climate change is _________________ that requires immediate action.

① a minor environmental concern
② an inevitable natural process
③ one of humanity's greatest challenges
④ primarily an economic issue
⑤ a problem for future generations

정답: ③

Model Card

용도 및 제한사항

적합한 용도:

고등학교 영어 교육용 문제 생성
교육 콘텐츠 자동화

제한사항:

고등학교 수준을 벗어난 전문적 내용에는 부적합
문화적 맥락이 강한 지문의 경우 정확도 저하 가능
생성된 문제는 전문가 검토 권장

편향성 및 위험성

학습 데이터의 편향이 반영될 수 있음
생성된 내용의 사실성 검증 필요
교육 목적 외 사용 시 주의 요망

📖 인용

@misc{qwen3-korean-english-exam,
  title={Qwen3-8B-Korean-Highschool-English-Exam},
  author={Hugging Face KREW},
  year={2024},
  publisher={suil0109},
  url={https://huggingface.co/huggingface-KREW/Qwen3-8B-Korean-Highschool-English-Exam}
}

📄 라이선스

이 모델은 Apache 2.0 라이선스 하에 배포됩니다.

🤝 기여 및 지원

Contact: [[email protected]]

huggingface-KREW
/

Qwen3-8B-Korean-Highschool-English-Exam

Qwen3-8B-Korean-Highschool-English-Exam

📚 개요

🎯 주요 특징

📝 지원 문제 유형

🚀 빠른 시작

설치

기본 사용법

배치 문제 생성

📊 데이터셋 구성

RAW 데이터 형식

Fine-tuning 데이터 형식

🔧 모델 상세 정보

기본 모델

학습 설정

평가 방법

💡 사용 예시

1. 제목 추론 문제 생성

2. 빈칸 추론 문제 생성

Model Card

용도 및 제한사항

편향성 및 위험성

📖 인용

📄 라이선스

🤝 기여 및 지원

Model tree for huggingface-KREW/Qwen3-8B-Korean-Highschool-English-Exam