🐾 meow-clovax-v1

meow-clovax-v1은 감정(emotion)과 동물 유형(post_type)에 따라 문장을 자연스럽게 변환하는 한국어 LLM입니다.

nick_name : haebo/Meow-HyperCLOVAX-1.5B_FullFT_fp32_0615i
본 모델은 naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-1.5B를 기반으로 Supervised Finetuning(SFT) 방식으로 학습되었습니다.

🧠 Model Details

항목	설명
Base Model	HyperCLOVAX-SEED-Text-Instruct-1.5B
Fine-tuning Method	Supervised Finetuning (SFT)
Model Type	Decoder-only
Language	Korean (primary)
Parameters	1.5B
Precision	fp16 / fp32
Version	v1
Framework	Transformers
license	hyperclovax-seed

📦 Training Details

Dataset: 감정 및 동물 말투에 따라 수집·합성된 style transfer 데이터셋 (비공개)
- 각 샘플은 content, emotion, post_type, transformed_content 필드로 구성된 jsonl 데이터셋
Task: Instruct-style fine-tuning (prompt → transformed response)
Prompt 구조:
- instruction:"다음 문장을 [동물]의 [감정]한 말투로 바꿔줘.\nInput: ...\nOutput:"
Epochs: 3
Training Infrastructure: Google Colab Pro+ (A100)
Instruction Infrastructure: Google Colab Pro+ (T4) / GCP T4

💡 Intended Use

감정 및 동물 말투 스타일 변환
캐릭터 챗봇, 감정 표현 챗봇 등

⚠️ Limitations & Bias

감정 및 동물 유형에 따라 변환이 부자연스러울 수 있음
데이터셋의 한계로 특정 감정/동물 유형에 편향이 있을 수 있음
부적절한 입력에 대해 예상치 못한 출력을 생성할 수 있음
출력에 부적절한 요소가 많이 포함되어 있어 후처리를 진행한 결과를 사용

🚀 How to Use

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "haebo/meow-clovax-v1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

content = "짜증났겠네 나도 아침마다 짜증남"
emotion = "angry"
post_type = "cat"
instruction = f"다음 문장을 {post_type}의 {emotion}한 말투로 바꿔줘."

prompt = (
  f"### Instruction:\n{example['instruction']}\n"
  f"### Input:\n{example['input']}\n"
  f"### Output:\n{example['output']}"
)

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=400)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

🗂️ Dataset

v1 모델에는 아래와 같은 데이터셋이 사용되었습니다.이 데이터들은 별도의 전처리(클랜징/필터링) 없이 원본 그대로 활용되었습니다.
파인튜닝 시 프롬프트 구조에 맞게 변경되었습니다.

데이터 구조
각 샘플은 아래와 같은 필드로 구성되어 있습니다.
- content: 원본 문장 (일상 한국어)
- emotion: 감정 레이블 (예: happy, sad, angry 등)
- post_type: 동물 유형 (예: cat, dog)
- transformed_content: 감정 및 동물 말투로 변환된 문장
예시

{
  "content": "오늘 점심 뭐 먹지.",
  "emotion": "normal",
  "post_type": dog",
  "transformed_content": "오늘 점심 뭐 먹지멍? 🐾 맛있는 냄새가 나는 것 같다멍! 주인님, 저 밥 어딨냐왈! 빨리 밥그릇 채워달라멍! 🦴 ᓚ₍´ ꒳ `₎ა"
}

데이터셋 (총 4,827개)
- dataset_0515_made (342개): 초기 유저 데이터
- dataset_0527_made (818개): 유저 게시글 기반 감정별/동물별 데이터
- dataset_0530_made (2,986개): 감정별 증폭된 게시글 기반 데이터
- dataset_0613_made (681개): 유저 댓글 입력에 대한 규칙 기반 변환(cat)

haebo
/

meow-clovax-v1

🐾 meow-clovax-v1

🧠 Model Details

📦 Training Details

💡 Intended Use

⚠️ Limitations & Bias

🚀 How to Use

🗂️ Dataset

Model tree for haebo/meow-clovax-v1

Dataset used to train haebo/meow-clovax-v1