gemma-2b-orpo

This is an ORPO fine-tune of google/gemma-2b with alvarobartt/dpo-mix-7k-simplified.

โšก Quantized version (GGUF): https://huggingface.co/anakin87/gemma-2b-orpo-GGUF

ORPO

ORPO (Odds Ratio Preference Optimization) is a new training paradigm that combines the usually separated phases of SFT (Supervised Fine-Tuning) and Preference Alignment (usually performed with RLHF or simpler methods like DPO).

  • Faster training
  • Less memory usage (no reference model needed)
  • Good results!

๐Ÿ† Evaluation

Nous

gemma-2b-orpo performs well for its size on Nous' benchmark suite.

(evaluation conducted using LLM AutoEval).

Model Average AGIEval GPT4All TruthfulQA Bigbench
anakin87/gemma-2b-orpo ๐Ÿ“„ 39.45 23.76 58.25 44.47 31.32
mlabonne/Gemmalpaca-2B ๐Ÿ“„ 38.39 24.48 51.22 47.02 30.85
google/gemma-2b-it ๐Ÿ“„ 36.1 23.76 43.6 47.64 29.41
google/gemma-2b ๐Ÿ“„ 34.26 22.7 43.35 39.96 31.03

Open LLM Leaderboard

Detailed results can be found here.

By comparison, on the Open LLM Leaderboard, google/gemma-2b-it has an average of 42.75.

Metric Value
Avg. 47.35
AI2 Reasoning Challenge (25-Shot) 49.15
HellaSwag (10-Shot) 73.72
MMLU (5-Shot) 38.52
TruthfulQA (0-shot) 44.53
Winogrande (5-shot) 64.33
GSM8k (5-shot) 13.87

๐Ÿ™ Dataset

alvarobartt/dpo-mix-7k-simplified is a simplified version of argilla/dpo-mix-7k. You can find more information in the dataset card.

๐ŸŽฎ Model in action

Usage notebook

๐Ÿ““ Chat and RAG using Haystack

Simple text generation with Transformers

The model is small, so it runs smoothly on Colab. It is also fine to load the model using quantization.

# pip install transformers accelerate
import torch
from transformers import pipeline
pipe = pipeline("text-generation", model="anakin87/gemma-2b-orpo", torch_dtype=torch.bfloat16, device_map="auto")
messages = [{"role": "user", "content": "Write a rap song on Vim vs VSCode."}]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False)
outputs = pipe(prompt, max_new_tokens=500, do_sample=True, temperature=0.7,  top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])

Training

The model was trained using HF TRL. ๐Ÿ““ Training notebook

Framework versions

  • Transformers 4.39.1
  • Pytorch 2.2.0+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
47
Safetensors
Model size
2.51B params
Tensor type
BF16
ยท
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for anakin87/gemma-2b-orpo

Base model

google/gemma-2b
Finetuned
(192)
this model
Finetunes
9 models
Quantizations
3 models

Dataset used to train anakin87/gemma-2b-orpo

Space using anakin87/gemma-2b-orpo 1

Evaluation results