metadata
license: apache-2.0
base_model: alignment-handbook/zephyr-7b-sft-full
tags:
- alignment_handbook-handbook
- generated_from_trainer
datasets:
- HuggingFaceH4/ultrafeedback_binarized
model-index:
- name: mistral-7B-DPO
results:
- task:
type: text-generation
dataset:
name: IFEval
type: IFEval
metrics:
- name: inst_level_strict_acc
type: IFEval
value: 53.06
source:
name: Open LLM Leaderboard
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard
- task:
type: text-generation
dataset:
name: BBH
type: BBH
metrics:
- name: acc_norm
type: Big Bench Hard (BBH)
value: 21.78
source:
name: Open LLM Leaderboard
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard
- task:
type: text-generation
dataset:
name: MATH
type: MATH
metrics:
- name: exact_match
type: Math Challenges
value: 2.87
source:
name: Open LLM Leaderboard
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard
- task:
type: text-generation
dataset:
name: GPQA
type: GPQA
metrics:
- name: acc_norm
type: Generalized Purpose Question Answering (GPQA)
value: 3.47
source:
name: Open LLM Leaderboard
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard
- task:
type: text-generation
dataset:
name: MuSR
type: MuSR
metrics:
- name: acc_norm
type: MuSR
value: 7.54
source:
name: Open LLM Leaderboard
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard
- task:
type: text-generation
dataset:
name: MMLU-PRO
type: MMLU-PRO
metrics:
- name: acc
type: MMLU-PRO
value: 19.59
source:
name: Open LLM Leaderboard
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard
MistralForCausalLM_Cal_DPO
This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset.
Model description
The Cal-DPO algorithm effectively addresses the alignment problem between large language models and human preferences by calibrating the implicit rewards in comparative preference learning to match the real rewards. It has demonstrated excellent performance in multiple task benchmark tests.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 64
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
We evaluate models on 6 key benchmarks using the Eleuther AI Language Model Evaluation Harness , a unified framework to test generative language models on a large number of different evaluation tasks.
- IFEval (https://arxiv.org/abs/2311.07911)
- BBH (Big Bench Hard) (https://arxiv.org/abs/2210.09261)
- GPQA (Graduate-Level Google-Proof Q&A Benchmark) (https://arxiv.org/abs/2311.12022)
- MuSR (Multistep Soft Reasoning) (https://arxiv.org/abs/2310.16049)
- MMLU-PRO (Massive Multitask Language Understanding - Professional) (https://arxiv.org/abs/2406.01574)
Framework versions
- Transformers 4.40.2
- Pytorch 2.1.2+cu121
- Datasets 2.14.6
- Tokenizers 0.19.1