Qwen2-VL-7B LoRA (AJU Multimodal Challenge)

๋‹จ์ผ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋ชจ๋ธ(์ด๋ฏธ์ง€ ์บก์…˜, VQA, ์ˆ˜ํ•™ ์ถ”๋ก , ๋ฌธ๋งฅ QA, ์š”์•ฝ)์„ ๋ถ„๊ธฐ ์—†๋Š” ํ”„๋กฌํ”„ํŠธ ๋ผ์šฐํŒ…์œผ๋กœ ์ฒ˜๋ฆฌํ•˜๋„๋ก Qwen/Qwen2-VL-7B-Instruct์— LoRA(QLoRA) ๋ฅผ ์ ์šฉํ•ด ํŠœ๋‹ํ•œ ์–ด๋Œ‘ํ„ฐ ๊ฐ€์ค‘์น˜์ž…๋‹ˆ๋‹ค.
๋ฒ ์ด์Šค ๋ชจ๋ธ ๊ฐ€์ค‘์น˜๋Š” ํฌํ•จํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

  • ๊ฐœ๋ฐœ์ž: tahn0321/qwen2vl-7b-ajou-lora
  • Finetuned from: Qwen/Qwen2-VL-7B-Instruct
  • ํ™˜๊ฒฝ: Colab A100 / PyTorch / Transformers / PEFT / bitsandbytes
  • ํ•ต์‹ฌ ํŠน์ง•: Single Adapter, No task-branching, Vision Tower Frozen, Auto-Grow ๋””์ฝ”๋”ฉ

๐Ÿ”ง ์‚ฌ์šฉ๋ฒ• (์–ด๋Œ‘ํ„ฐ ๋กœ๋“œ)

from transformers import AutoProcessor, AutoModelForCausalLM
from peft import PeftModel

base_id = "Qwen/Qwen2-VL-7B-Instruct"
adapter_id = "tahn0321/qwen2vl-7b-ajou-lora"
processor = AutoProcessor.from_pretrained(base_id, trust_remote_code=True)
base = AutoModelForCausalLM.from_pretrained(base_id, device_map="auto", trust_remote_code=True)
model = PeftModel.from_pretrained(base, adapter_id)
model.eval()


# msgs = [{"role":"system","content":[{"type":"text","text":"You are a concise, honest, multimodal assistant."}]},
#         {"role":"user","content":[{"type":"image"}, {"type":"text","text":"Question: ..."}]}]
# prompt = processor.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True)
# enc = processor(text=prompt, images=[pil_image], return_tensors="pt")
# out = model.generate(**{k:v.to(model.device) for k,v in enc.items()}, max_new_tokens=192)

๐Ÿ“ ํŒŒ์ผ ๊ตฌ์„ฑ

adapter_model.safetensors
adapter_config.json
processor_config.json
tokenizer.json
tokenizer.model
tokenizer_config.json

๐Ÿ”ฌ ํ•™์Šต ๊ฐœ์š”

ํŠœ๋‹ ๋ฐฉ์‹: QLoRA(4-bit) + LoRA

LoRA ๋Œ€์ƒ ๋ชจ๋“ˆ: q_proj, k_proj, v_proj, o_proj, up_proj, down_proj, gate_proj

LoRA ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ: r=32, alpha=16, dropout=0.05

๋น„์ „ ํƒ€์›Œ: ์™„์ „ ๋™๊ฒฐ

ํ•™์Šต ํ•˜์ดํผ(์˜ˆ์‹œ): per_device_batch=1, grad_accum=16, lr=1e-4, cosine, warmup=3%

์ •๋ฐ€๋„: bf16(๊ฐ€๋Šฅ ์‹œ) / fp16

๋ฐ์ดํ„ฐ: ๋Œ€ํšŒ ์ œ๊ณต ๋ฉ€ํ‹ฐํƒœ์Šคํฌ ๋ฐ์ดํ„ฐ(*.parquet)

ํ”„๋กฌํ”„ํŠธ: ์‹œ์Šคํ…œ ๊ณ ์ • + (์ด๋ฏธ์ง€+์งˆ๋ฌธ) ๋˜๋Š” (์ง€๋ฌธ + "Question: ...") โ€” ํƒœ์Šคํฌ ๋ถ„๊ธฐ ์—†์Œ

๋ผ๋ฒจ๋ง: ํ”„๋กฌํ”„ํŠธ ํ† ํฐ -100(๋งˆ์Šคํ‚น), ์ •๋‹ต ํ† ํฐ๋งŒ loss ๋ฐ˜์˜

๐Ÿง  ์ถ”๋ก  ๋ฉ”๋ชจ

๋””์ฝ”๋”ฉ: Greedy, no_repeat_ngram_size=4, repetition_penalty=1.05

Auto-Grow: ํƒœ์Šคํฌ๋ณ„ ๊ธธ์ด ํ”„๋กœํŒŒ์ผ๊ณผ ์ข…๊ฒฐ๋ถ€ํ˜ธ ๊ฐ์ง€๋กœ ์ค‘๊ฐ„ ๋Š๊น€ ๋ฐฉ์ง€

ํ›„์ฒ˜๋ฆฌ: ์ˆ˜ํ•™/์ˆซ์ž ์งˆ์˜๋Š” ์ตœ์ข… ์ˆซ์ž๋งŒ ํŒŒ์‹ฑํ•˜๋Š” ๊ฐ„๋‹จ ๊ทœ์น™(์˜ต์…˜)

๐Ÿ“Š ๊ฒฐ๊ณผ & ์ธ์‚ฌ์ดํŠธ(์š”์•ฝ)

8,000 ์ƒ˜ํ”Œ ๋ถ€๋ถ„ ํ•™์Šต์ด ์‹ค์‹œ๊ฐ„ ๋ฆฌ๋”๋ณด๋“œ BLEU์—์„œ ์ƒ๋Œ€์ ์œผ๋กœ ์œ ๋ฆฌํ–ˆ๋˜ ๋ฐ˜๋ฉด,
์ „์ฒด ํ•™์Šต/๊ฒ€์ฆ ๋ถ„ํ• ์—์„œ๋Š” ์ ์ˆ˜๊ฐ€ ๋‚ฎ์•„์ง€๋Š” ๊ฒฝ์šฐ๋ฅผ ๊ด€์ฐฐ.

๊ตํ›ˆ: ์ค‘๊ฐ„ ๋ฆฌ๋”๋ณด๋“œ ์ตœ์ ํ™”๋ณด๋‹ค ์ „์ฒด ๋ถ„ํฌ ์ผ๋ฐ˜ํ™”๋ฅผ ์šฐ์„ ํ•  ๊ฒƒ.

โœ… ๊ถŒ์žฅ ์‚ฌ์šฉ ๋ฒ”์œ„

๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์ง€์‹œ ์ถ”๋ก (์ด๋ฏธ์ง€ ์บก์…˜, VQA, ์š”์•ฝ ๋“ฑ) ์—ฐ๊ตฌ/์‹คํ—˜

๋‹จ์ผ ๋ชจ๋ธ/๋‹จ์ผ ์–ด๋Œ‘ํ„ฐ/๋ฌด๋ถ„๊ธฐ ํ”„๋กฌํ”„ํŠธ ๋ผ์šฐํŒ… ์„ค์ •

โš ๏ธ ์ œํ•œ/์ฃผ์˜

์ƒ์„ฑ ๋ชจ๋ธ ํŠน์„ฑ์ƒ ํ™˜๊ฐ/์˜คํ•ด์„ ๊ฐ€๋Šฅ์„ฑ.

์•ˆ์ „/์œค๋ฆฌ ์š”๊ตฌ์‚ฌํ•ญ์ด ์žˆ๋Š” ๋„๋ฉ”์ธ์—์„œ๋Š” ์ถ”๊ฐ€ ํ•„ํ„ฐ๋ง/๊ฐ€๋“œ๋ ˆ์ผ ํ•„์š”.

๋ฒ ์ด์Šค ๋ชจ๋ธ/๋ฐ์ดํ„ฐ์˜ ๋ผ์ด์„ ์Šคยท์•ฝ๊ด€์„ ์ค€์ˆ˜ํ•˜์„ธ์š”.

๐Ÿ”— ์ฐธ๊ณ 

Base model: https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct

ํ”„๋กœ์ ํŠธ ์„ค๋ช…/์ฝ”๋“œ: https://github.com/TaeYunAhn/2025_AJU_Multimodal_DeepLearning_Challenge.git

๐Ÿ“„ ๋ผ์ด์„ ์Šค

์–ด๋Œ‘ํ„ฐ ๊ฐ€์ค‘์น˜: apache-2.0

๋ฒ ์ด์Šค ๋ชจ๋ธ/๋ฐ์ดํ„ฐ: ๊ฐ ์ถœ์ฒ˜ ๋ผ์ด์„ ์Šค/์•ฝ๊ด€ ์ค€์ˆ˜

Downloads last month
18
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for tahn0321/qwen2vl-7b-ajou-lora

Base model

Qwen/Qwen2-VL-7B
Adapter
(168)
this model