|
--- |
|
license: mit |
|
datasets: |
|
- mlabonne/orpo-dpo-mix-40k |
|
--- |
|
|
|
This is a uncenscored version of Phi-3. |
|
|
|
Abliterated using the following the guide here: https://huggingface.co/blog/mlabonne/abliteration |
|
|
|
Then it was fine tuned on orpo-dpo-mix-40k |
|
|
|
[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl) |
|
<details><summary>See axolotl config</summary> |
|
|
|
axolotl version: `0.4.0` |
|
```yaml |
|
base_model: cowWhySo/Phi-3-mini-4k-instruct-Friendly |
|
trust_remote_code: true |
|
model_type: AutoModelForCausalLM |
|
tokenizer_type: AutoTokenizer |
|
chat_template: phi_3 |
|
|
|
load_in_8bit: false |
|
load_in_4bit: true |
|
strict: false |
|
save_safetensors: true |
|
|
|
rl: dpo |
|
datasets: |
|
- path: mlabonne/orpo-dpo-mix-40k |
|
split: train |
|
type: chatml.intel |
|
|
|
dataset_prepared_path: |
|
val_set_size: 0.0 |
|
output_dir: ./out |
|
|
|
sequence_len: 4096 |
|
sample_packing: false |
|
pad_to_sequence_len: false |
|
|
|
adapter: qlora |
|
lora_model_dir: |
|
|
|
lora_r: 64 |
|
lora_alpha: 32 |
|
lora_dropout: 0.1 |
|
lora_target_linear: true |
|
lora_fan_in_fan_out: |
|
|
|
wandb_project: axolotl |
|
wandb_entity: |
|
wandb_watch: |
|
wandb_name: phi3-mini-4k-instruct-Friendly |
|
wandb_log_model: |
|
|
|
gradient_accumulation_steps: 8 |
|
micro_batch_size: 4 |
|
num_epochs: 1 |
|
optimizer: paged_adamw_8bit |
|
lr_scheduler: linear |
|
learning_rate: 5e-6 |
|
train_on_inputs: false |
|
group_by_length: false |
|
|
|
bf16: auto |
|
|
|
gradient_checkpointing: true |
|
gradient_checkpointing_kwargs: |
|
use_reentrant: True |
|
early_stopping_patience: |
|
resume_from_checkpoint: |
|
local_rank: |
|
logging_steps: 1 |
|
xformers_attention: |
|
flash_attention: true |
|
warmup_steps: 150 |
|
evals_per_epoch: 0 |
|
eval_table_size: |
|
eval_table_max_new_tokens: 128 |
|
saves_per_epoch: 1 |
|
debug: |
|
deepspeed: deepspeed_configs/zero3.json |
|
weight_decay: 0.01 |
|
max_grad_norm: 1.0 |
|
resize_token_embeddings_to_32x: true |
|
``` |
|
|
|
</details><br> |
|
|
|
|
|
## Quants |
|
|
|
GGUF: https://huggingface.co/cowWhySo/Phi-3-mini-4k-instruct-Friendly-gguf |
|
|
|
## Benchmarks |
|
|
|
| Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average| |
|
|--------------------------------------------------------------------------------------------------|------:|------:|---------:|-------:|------:| |
|
|[Phi-3-mini-4k-instruct-Friendly](https://huggingface.co/cowWhySo/Phi-3-mini-4k-instruct-Friendly)| 41| 67.56| 46.36| 39.3| 48.56| |
|
|
|
### AGIEval |
|
| Task |Version| Metric |Value| |Stderr| |
|
|------------------------------|------:|--------|----:|---|-----:| |
|
|agieval_aqua_rat | 0|acc |22.05|± | 2.61| |
|
| | |acc_norm|22.05|± | 2.61| |
|
|agieval_logiqa_en | 0|acc |41.01|± | 1.93| |
|
| | |acc_norm|41.32|± | 1.93| |
|
|agieval_lsat_ar | 0|acc |22.17|± | 2.75| |
|
| | |acc_norm|22.17|± | 2.75| |
|
|agieval_lsat_lr | 0|acc |45.69|± | 2.21| |
|
| | |acc_norm|45.88|± | 2.21| |
|
|agieval_lsat_rc | 0|acc |59.48|± | 3.00| |
|
| | |acc_norm|56.51|± | 3.03| |
|
|agieval_sat_en | 0|acc |75.24|± | 3.01| |
|
| | |acc_norm|70.39|± | 3.19| |
|
|agieval_sat_en_without_passage| 0|acc |39.81|± | 3.42| |
|
| | |acc_norm|37.86|± | 3.39| |
|
|agieval_sat_math | 0|acc |33.64|± | 3.19| |
|
| | |acc_norm|31.82|± | 3.15| |
|
|
|
Average: 41.0% |
|
|
|
### GPT4All |
|
| Task |Version| Metric |Value| |Stderr| |
|
|-------------|------:|--------|----:|---|-----:| |
|
|arc_challenge| 0|acc |49.74|± | 1.46| |
|
| | |acc_norm|50.43|± | 1.46| |
|
|arc_easy | 0|acc |76.68|± | 0.87| |
|
| | |acc_norm|73.23|± | 0.91| |
|
|boolq | 1|acc |79.27|± | 0.71| |
|
|hellaswag | 0|acc |57.91|± | 0.49| |
|
| | |acc_norm|77.13|± | 0.42| |
|
|openbookqa | 0|acc |35.00|± | 2.14| |
|
| | |acc_norm|43.80|± | 2.22| |
|
|piqa | 0|acc |77.86|± | 0.97| |
|
| | |acc_norm|79.54|± | 0.94| |
|
|winogrande | 0|acc |69.53|± | 1.29| |
|
|
|
Average: 67.56% |
|
|
|
### TruthfulQA |
|
| Task |Version|Metric|Value| |Stderr| |
|
|-------------|------:|------|----:|---|-----:| |
|
|truthfulqa_mc| 1|mc1 |31.21|± | 1.62| |
|
| | |mc2 |46.36|± | 1.55| |
|
|
|
Average: 46.36% |
|
|
|
### Bigbench |
|
| Task |Version| Metric |Value| |Stderr| |
|
|------------------------------------------------|------:|---------------------|----:|---|-----:| |
|
|bigbench_causal_judgement | 0|multiple_choice_grade|54.74|± | 3.62| |
|
|bigbench_date_understanding | 0|multiple_choice_grade|66.67|± | 2.46| |
|
|bigbench_disambiguation_qa | 0|multiple_choice_grade|29.46|± | 2.84| |
|
|bigbench_geometric_shapes | 0|multiple_choice_grade|11.98|± | 1.72| |
|
| | |exact_str_match | 0.00|± | 0.00| |
|
|bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|28.00|± | 2.01| |
|
|bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|17.14|± | 1.43| |
|
|bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|45.67|± | 2.88| |
|
|bigbench_movie_recommendation | 0|multiple_choice_grade|24.40|± | 1.92| |
|
|bigbench_navigate | 0|multiple_choice_grade|53.70|± | 1.58| |
|
|bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|68.10|± | 1.04| |
|
|bigbench_ruin_names | 0|multiple_choice_grade|31.03|± | 2.19| |
|
|bigbench_salient_translation_error_detection | 0|multiple_choice_grade|15.93|± | 1.16| |
|
|bigbench_snarks | 0|multiple_choice_grade|77.35|± | 3.12| |
|
|bigbench_sports_understanding | 0|multiple_choice_grade|52.64|± | 1.59| |
|
|bigbench_temporal_sequences | 0|multiple_choice_grade|51.50|± | 1.58| |
|
|bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|19.52|± | 1.12| |
|
|bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|13.89|± | 0.83| |
|
|bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|45.67|± | 2.88| |
|
|
|
Average: 39.3% |
|
|
|
Average score: 48.56% |
|
|
|
## Training Summary |
|
|
|
```json |
|
{ |
|
"train/loss": 0.299, |
|
"train/grad_norm": 0.9337566701340533, |
|
"train/learning_rate": 0, |
|
"train/rewards/chosen": 0.08704188466072083, |
|
"train/rewards/rejected": -2.835820436477661, |
|
"train/rewards/accuracies": 0.84375, |
|
"train/rewards/margins": 2.9228620529174805, |
|
"train/logps/rejected": -509.9840393066406, |
|
"train/logps/chosen": -560.8234252929688, |
|
"train/logits/rejected": 1.6356163024902344, |
|
"train/logits/chosen": 1.7323706150054932, |
|
"train/epoch": 1.002169197396963, |
|
"train/global_step": 231, |
|
"_timestamp": 1717711643.3345022, |
|
"_runtime": 22808.557655334473, |
|
"_step": 231, |
|
"train_runtime": 22809.152, |
|
"train_samples_per_second": 1.944, |
|
"train_steps_per_second": 0.01, |
|
"total_flos": 0, |
|
"train_loss": 0.44557410065745895, |
|
"_wandb": { |
|
"runtime": 22810 |
|
} |
|
} |
|
``` |