File size: 3,711 Bytes

---
license: apache-2.0
base_model: Qwen/Qwen3-1.7B
tags:
- text-generation
- korean
- news
- event-extraction
- fine-tuned
- qwen
language:
- ko
library_name: transformers
pipeline_tag: text-generation
---

# Qwen3-1.7B News Event Extraction Model

이 모델은 **Qwen/Qwen3-1.7B**를 기반으로 한국어 뉴스 텍스트 분석 및 이벤트 추출을 위해 **LoRA** 방식으로 파인튜닝된 모델입니다.

## ✨ 모델 기능

- **뉴스 카테고리 분류**: 부동산, 산업, 오피니언, 증권 카테고리로 분류
- **핵심 이벤트 추출**: 뉴스 텍스트에서 주요 이벤트들을 추출
- **구조화된 출력**: Python dictionary 형식으로 결과 제공

## 사용법

### 기본 설정

```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import json

# 모델 및 토크나이저 로드
repo_id = "sogm1/qwen3-1.7b-news-event-merged"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(
    repo_id,
    torch_dtype="auto",
    device_map="auto"
).eval()
```

### 시스템 프롬프트 
```python
system_prompt = """당신은 뉴스 텍스트를 분석하여 카테고리를 분류하고 주요 핵심 이벤트들을 추출하는 전문 분석 시스템입니다.
주어진 텍스트를 분석하여 반드시 파이썬의 dictionary 형식으로 결과를 작성하십시오.

분석 결과는 다음 형식으로 작성하십시오:

답변:
{"category": "['부동산', '산업', '오피니언', '증권'] 중 하나",
"event_count": "핵심 이벤트 개수(정수)",
"events": ["이벤트1", "이벤트2", ...]}"""
```

### ineference 
```python
def generate_analysis(model, tokenizer, text, system_prompt):
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": f"다음 뉴스 텍스트를 분석해주세요:\n\n{text}"}
    ]
    
    prompt_text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True,
        enable_thinking=False
    )
    
    inputs = tokenizer([prompt_text], return_tensors="pt").to(model.device)
    
    with torch.no_grad():
        generated_ids = model.generate(
            **inputs,
            max_length=8192,
            temperature=0.5,
            top_p=1.0,
            do_sample=False
        )
    
    output_ids = generated_ids[0][len(inputs.input_ids[0]):]
    decoded_output = tokenizer.decode(output_ids, skip_special_tokens=True).strip()
    
    return decoded_output
```

## 사용예시 
```python
# 샘플 뉴스 텍스트
news_text = """
삼성전자가 3분기 실적 발표에서 메모리 반도체 부문의 회복세를 보고했다. 
회사는 D램 가격이 전분기 대비 15% 상승했으며, 낸드플래시 출하량도 20% 증가했다고 밝혔다.
특히 AI 서버용 고대역폭메모리(HBM) 매출이 전년 동기 대비 300% 급증하면서 실적 개선을 견인했다.
"""

# 분석 실행
result = generate_analysis(model, tokenizer, news_text, system_prompt)

# 결과 파싱 및 출력
if "답변:" in result:
    json_part = result.split("답변:")[-1].strip()
    parsed_result = json.loads(json_part)
    
    print(f" 카테고리: {parsed_result['category']}")
    print(f" 이벤트 개수: {parsed_result['event_count']}")
    print("이벤트 목록:")
    for i, event in enumerate(parsed_result['events'], 1):
        print(f"   {i}. {event}")
```
## 출력예시 
```python
 카테고리: 산업
 이벤트 개수: 3
 이벤트 목록:
   1. D램 가격이 전분기 대비 15% 상승
   2. 낸드플래시 출하량이 20% 증가
   3. HBM 매출이 전년 동기 대비 300% 급증
```