Dohoon_Qwen2-VL-7B-Instruct_ForAju (μ•„μ£ΌλŒ€ν•™κ΅ λ©€ν‹°λͺ¨λ‹¬ λ”₯λŸ¬λ‹ μ±Œλ¦°μ§€)

단일 λ©€ν‹°λͺ¨λ‹¬ λͺ¨λΈ(이미지 μΊ‘μ…˜, VQA, μˆ˜ν•™ μΆ”λ‘ , λ¬Έλ§₯ QA, μš”μ•½)을 λ³„λ„μ˜ μž‘μ—… λΆ„κΈ°(task-branching) 없이 단일 ν”„λ‘¬ν”„νŠΈ λΌμš°νŒ… 둜직으둜 μ²˜λ¦¬ν•˜λ„λ‘ Qwen/Qwen2-VL-7B-Instruct λͺ¨λΈμ— LoRA(QLoRA)λ₯Ό μ μš©ν•΄ λ―Έμ„Έμ‘°μ •ν•œ μ–΄λŒ‘ν„° κ°€μ€‘μΉ˜μž…λ‹ˆλ‹€.

이 μ €μž₯μ†ŒλŠ” μ–΄λŒ‘ν„° κ°€μ€‘μΉ˜λ§Œ ν¬ν•¨ν•˜λ©°, 원본 베이슀 λͺ¨λΈμ€ ν¬ν•¨ν•˜μ§€ μ•ŠμŠ΅λ‹ˆλ‹€.

  • 개발자: dohoon0508
  • Finetuned from: Qwen/Qwen2-VL-7B-Instruct
  • ν™˜κ²½: Google Colab / PyTorch / Transformers / PEFT / bitsandbytes
  • 핡심 νŠΉμ§•:
    • Single System Prompt: λͺ¨λ“  νƒœμŠ€ν¬λ₯Ό 단일 μ‹œμŠ€ν…œ ν”„λ‘¬ν”„νŠΈλ‘œ μ²˜λ¦¬ν•˜μ—¬ λΆ„κΈ° μ—†λŠ” μΆ”λ‘  νŒŒμ΄ν”„λΌμΈ κ΅¬ν˜„
    • Rule-based Task Routing: μž…λ ₯(이미지/ν…μŠ€νŠΈ)κ³Ό 질문 μœ λ¬΄μ— 따라 5κ°€μ§€ νƒœμŠ€ν¬(Captioning, VQA, Math, Text QA, Summarization)λ₯Ό λ™μ μœΌλ‘œ κ²°μ •
    • Task-specific Decoding: 각 νƒœμŠ€ν¬μ˜ νŠΉμ„±μ— 맞좰 μ΅œλŒ€ 생성 토큰 μˆ˜μ™€ λ¬Έμž₯ 개수 기반의 동적 쀑단 κΈ°μ€€ 적용
    • Vision Tower Frozen: ν›ˆλ ¨ 쀑 Vision Tower의 κ°€μ€‘μΉ˜λŠ” λ™κ²°ν•˜μ—¬ νš¨μœ¨μ„± μ¦λŒ€

πŸ”§ μ‚¬μš©λ²• (μ–΄λŒ‘ν„° λ‘œλ“œ)

λ‹€μŒμ€ transformers와 peft 라이브러리λ₯Ό μ‚¬μš©ν•˜μ—¬ 베이슀 λͺ¨λΈμ— λ³Έ μ–΄λŒ‘ν„°λ₯Ό λ‘œλ“œν•˜λŠ” λ°©λ²•μž…λ‹ˆλ‹€.

from transformers import AutoProcessor, AutoModelForCausalLM
from peft import PeftModel
import torch

base_id = "Qwen/Qwen2-VL-7B-Instruct"
adapter_id = "dohoon0508/Dohoon_Qwen2-VL-7B-Instruct_ForAju"

# ν”„λ‘œμ„Έμ„œ 및 4-bit μ–‘μžν™”λœ 베이슀 λͺ¨λΈ λ‘œλ“œ
processor = AutoProcessor.from_pretrained(base_id, trust_remote_code=True)
base_model = AutoModelForCausalLM.from_pretrained(
    base_id,
    device_map="auto",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16, # or torch.float16
    load_in_4bit=True
)

# μ–΄λŒ‘ν„°(LoRA) κ°€μ€‘μΉ˜ 적용
model = PeftModel.from_pretrained(base_model, adapter_id)
model.eval()

# μΆ”λ‘  μ˜ˆμ‹œ (VQA)
# from PIL import Image
# import requests

# image_url = "[https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/bee.JPG](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/bee.JPG)"
# image = Image.open(requests.get(image_url, stream=True).raw).convert("RGB")
# question = "Question: What is the main subject in this image?"

# messages = [
#     {"role": "system", "content": [{"type": "text", "text": "You are a multimodal assistant..."}]}, # μ‹€μ œ μ‚¬μš©ν•˜λŠ” μ‹œμŠ€ν…œ ν”„λ‘¬ν”„νŠΈ 적용
#     {"role": "user", "content": [{"type": "image"}, {"type": "text", "text": question}]}
# ]

# prompt = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
# enc = processor(text=prompt, images=[image], return_tensors="pt")

# out = model.generate(**{k: v.to(model.device) for k, v in enc.items()}, max_new_tokens=128)
# generated_text = processor.batch_decode(out, skip_special_tokens=True)[0]
# print(generated_text)
πŸ“ 파일 ꡬ성
adapter_model.safetensors: LoRA μ–΄λŒ‘ν„° κ°€μ€‘μΉ˜ 파일

adapter_config.json: μ–΄λŒ‘ν„° μ„€μ • 파일

README.md: λͺ¨λΈ μΉ΄λ“œ

tokenizer.json, tokenizer.model, tokenizer_config.json, processor_config.json λ“± 기타 μ„€μ • 파일

πŸ”¬ ν•™μŠ΅ κ°œμš”
νŠœλ‹ 방식: QLoRA (4-bit NormalFloat) + LoRA

LoRA λŒ€μƒ λͺ¨λ“ˆ: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

LoRA ν•˜μ΄νΌνŒŒλΌλ―Έν„°:

r = 32

lora_alpha = 16

lora_dropout = 0.05

λΉ„μ „ νƒ€μ›Œ: μ™„μ „ 동결 (Frozen)

ν•™μŠ΅ ν•˜μ΄νΌνŒŒλΌλ―Έν„°:

per_device_train_batch_size = 1

gradient_accumulation_steps = 16

learning_rate = 1e-4 (Cosine μŠ€μΌ€μ€„λŸ¬)

warmup_ratio = 0.03

정밀도: bf16 (μ‚¬μš© κ°€λŠ₯ μ‹œ) / fp16

데이터: λŒ€νšŒ 제곡 λ©€ν‹°νƒœμŠ€ν¬ 데이터 (.parquet)

ν”„λ‘¬ν”„νŠΈ: κ³ μ •λœ 단일 μ‹œμŠ€ν…œ ν”„λ‘¬ν”„νŠΈ + (이미지/ν…μŠ€νŠΈ + 질문) ν˜•νƒœλ‘œ κ΅¬μ„±ν•˜μ—¬ νƒœμŠ€ν¬ λΆ„κΈ° μ—†μŒ

라벨링: 손싀 계산 μ‹œ ν”„λ‘¬ν”„νŠΈμ— ν•΄λ‹Ήν•˜λŠ” 토큰은 -100으둜 λ§ˆμŠ€ν‚Ήν•˜μ—¬ μ •λ‹΅ ν† ν°μ—λ§Œ loss 반영

🧠 μΆ”λ‘  λ©”λͺ¨
λ””μ½”λ”©:

Greedy Search (do_sample=False, num_beams=1)

no_repeat_ngram_size = 4

repetition_penalty = 1.05

동적 생성 μ œμ–΄:

νƒœμŠ€ν¬ μ’…λ₯˜(Captioning, Summarization λ“±)에 따라 μ΅œλŒ€ 생성 토큰 수λ₯Ό λ™μ μœΌλ‘œ 쑰절

λ¬Έμž₯ λΆ€ν˜Έ(., !, ?) 개수λ₯Ό κ°μ§€ν•˜μ—¬ μ§€μ •λœ λ¬Έμž₯ μˆ˜μ— λ„λ‹¬ν•˜λ©΄ 생성을 μ‘°κΈ° μ€‘λ‹¨ν•˜λŠ” StopOnSentenceCount κΈ°μ€€ 적용

ν›„μ²˜λ¦¬:

κΈˆμΉ™μ–΄("I'm sorry", "As an AI" λ“±) 제거

μˆ˜ν•™ 문제의 경우, 정닡을 #### {answer} ν˜•μ‹μœΌλ‘œ μΆ”μΆœ/κ°•μ œ

VQA 응닡은 간결성을 μœ„ν•΄ 첫 λ¬Έμž₯만 μ‚¬μš©

βœ… ꢌμž₯ μ‚¬μš© λ²”μœ„
이미지 캑셔닝, VQA, ν…μŠ€νŠΈ μš”μ•½ λ“± λ‹€μ–‘ν•œ λ©€ν‹°λͺ¨λ‹¬ μ§€μ‹œ(Instruction)λ₯Ό 단일 λͺ¨λΈλ‘œ μ²˜λ¦¬ν•˜λŠ” 연ꡬ/μ‹€ν—˜

λ³„λ„μ˜ λΌμš°νŒ… 둜직 없이 ν”„λ‘¬ν”„νŠΈλ§ŒμœΌλ‘œ νƒœμŠ€ν¬λ₯Ό κ΅¬λΆ„ν•˜λŠ” λͺ¨λΈμ˜ λŠ₯λ ₯ 뢄석

LoRA/QLoRAλ₯Ό ν™œμš©ν•œ λŒ€κ·œλͺ¨ μ–Έμ–΄ λͺ¨λΈ(LLM)의 효율적 νŒŒμΈνŠœλ‹ 사둀 연ꡬ

⚠️ μ œν•œ 및 μ£Όμ˜μ‚¬ν•­
생성 λͺ¨λΈμ˜ νŠΉμ„±μƒ 사싀과 λ‹€λ₯Έ 정보(Hallucination)λ‚˜ μ˜€ν•΄μ˜ μ†Œμ§€κ°€ μžˆλŠ” λ‚΄μš©μ„ 생성할 수 μžˆμŠ΅λ‹ˆλ‹€.

λ―Όκ°ν•˜κ±°λ‚˜ μ•ˆμ „/윀리적 μš”κ΅¬μ‚¬ν•­μ΄ μ€‘μš”ν•œ 도메인에 μ μš©ν•  경우, λ°˜λ“œμ‹œ 좔가적인 필터링 λ˜λŠ” κ°€λ“œλ ˆμΌ μž₯μΉ˜κ°€ ν•„μš”ν•©λ‹ˆλ‹€.

베이슀 λͺ¨λΈ(Qwen/Qwen2-VL-7B-Instruct) 및 ν•™μŠ΅ λ°μ΄ν„°μ˜ 원본 λΌμ΄μ„ μŠ€μ™€ 약관을 μ€€μˆ˜ν•΄μ•Ό ν•©λ‹ˆλ‹€.

πŸ”— μ°Έκ³ 
Base model: Qwen/Qwen2-VL-7B-Instruct

ν”„λ‘œμ νŠΈ μ €μž₯μ†Œ: https://github.com/dohoon0508/ajukaggle
Downloads last month
12
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for dohoon0508/Dohoon_Qwen2-VL-7B-Instruct_ForAju

Base model

Qwen/Qwen2-VL-7B
Adapter
(168)
this model