Qwen2-0.5B-DPO / README.md
VinitT's picture
Update README.md (#2)
05da89a verified
metadata
library_name: transformers
license: apache-2.0
datasets:
  - HuggingFaceH4/ultrafeedback_binarized
language:
  - en
base_model:
  - Qwen/Qwen2-0.5B-Instruct
pipeline_tag: text-generation

Model Card for Model ID

Model Details

  • Base Model: Qwen2-0.5B
  • Fine-tuning Method: Direct Preference Optimization (DPO)
  • Framework: Unsloth
  • Quantization: 4-bit QLoRA (during training)

Uses

from transformers import AutoTokenizer
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "VinitT/Qwen2-0.5B-DPO",
    dtype = None,
    load_in_4bit = False,
)

messages = [{"role": "user", "content": "Hello,how can i develop a habit of drawing daily?"}]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_dict=True,
    return_tensors="pt"
)
inputs = {k: v.to(model.device) for k, v in inputs.items()}

# Generate
outputs = model.generate(
    **inputs,
    max_new_tokens=100,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)
# Decode only the new response (not the prompt)
prompt_len = inputs["input_ids"].shape[-1]
response = tokenizer.decode(outputs[0][prompt_len:], skip_special_tokens=True)

print(response.strip())